What is the best strategy for loading and persisting large volumes of data with Spring?

1

I need to perform a function in a Spring and Hibernate project, to which I must update some information from all records in my table. The idea is to load the records into the application, process the data based on some values from those records, and then persuade the entire mass of data.

The values to be updated are different for each record, and depend on the data already persisted in each one.

The table contains some 200,000 records, and I'd like to know the best strategy for me to be able to perform this data loading, processing, and persistence without bottlenecks in my application and database.

    
asked by anonymous 03.07.2018 / 14:56

3 answers

3

Spring has a suitable tool for this, called Spring Batch .

In a free translation:

  

Many applications in the enterprise environment require mass data processing to perform business operations in mission-critical environments. These operations include automated and complex processing of large volumes of information processed most of the time without user interaction. These operations typically include those based on events (for example, month-end calculations, warnings, etc.), periodic application of complex business rules repeatedly processed over a large volume of data (eg determination of insurance benefits, custom promotions) from internal and external systems that typically require formatting, validation, and record processing.

We recently used this tool in a project to migrate data from one system to another. We load a large amount of data through data files exported from the old bank and perform various processing over them: correcting information, seeing what information already exists in the database, taking images from one directory and moving to another, inserting the information in the database of current data, saving problematic records in a log file, etc.

At first it may scare you, but it's very simple to use.

    
04.07.2018 / 02:24
0

Another strategy would be to bring this data paged to memory (bring the data to java, to do all the processing), then update them and then do a batch insert in another table. And the old table you can do the drop. This process is faster than an update in the database. If you want to do this process I recommend making a backup of the table that will be deleted. If you do not want to mess with a programming language, there are data processing tools like PENTAHO (It does a lot more than data processing). I hope it helps. :)

    
04.07.2018 / 02:01
-1

In this case, when you have many records to display on the screen, you can use the paging that has control of how many you want to show in a GET Example of an application I did:

No controller:

@RequestMapping(value = "/page", method = RequestMethod.GET)

    public ResponseEntity<Page<CategoriaDTO>> findAll(
            @RequestParam(value="page", defaultValue="0") Integer page,'insira o código aqui'
            @RequestParam(value="linesPerPAge", defaultValue="24") Integer linesPerPAge,
            @RequestParam(value="orderBy", defaultValue="nome") String orderBy,
            @RequestParam(value="direction", defaultValue="ASC") String direction) {
        Page<Categoria> list = service.findPage(page, linesPerPAge, orderBy, direction);
        Page<CategoriaDTO> listDTO = list.map(obj -> new CategoriaDTO(obj));

        return ResponseEntity.ok().body(listDTO);
    }

In Service:

public Page<Categoria> findPage(Integer page, Integer linesPerPAge, String orderBy, String direction){

        PageRequest pageRequest = PageRequest.of(page, linesPerPAge, Direction.valueOf(direction), orderBy);
        return repo.findAll(pageRequest);
    } 

No need to do anything in the Repository!

    
01.08.2018 / 03:20