Problem reading a very large .xlsx file in Java

6

I'm working on a web application using Java, where I have a method that should read a .xlsx file using apache-poi :

public static void xlsx(String arquivo) throws IOException{

     try {
        FileInputStream file = new FileInputStream(new File(arquivo));
        XSSFWorkbook workbook = new XSSFWorkbook(file);
        XSSFSheet sheet = workbook.getSheetAt(0);
        Iterator<Row> rowIterator = sheet.iterator();
        while (rowIterator.hasNext()) {
            Row row = rowIterator.next();
            Iterator<Cell> cellIterator = row.cellIterator();
            while (cellIterator.hasNext()) {
                Cell celula = cellIterator.next();
                /*aqui faço a leitura de cada
                celula, fazendo o tratamento adequado 
                a cada campo.
                */
            }
        }
        file.close();

    } catch (IOException e) {
        e.printStackTrace();
        throw new IOException("Erro ao processar arquivo.",e.getCause());
    }
}

The method works correctly, however, as this method will probably process files with thousands of rows of records, for example about 25 to 300 thousand rows. When processing a very large file I take the following exception:

(http-localhost-127.0.0.1-8080-4) Servlet.service() for servlet RestServlet threw exception: org.jboss.resteasy.spi.UnhandledException: java.lang.OutOfMemoryError: Java heap space

I wanted to know how I can avoid this kind of error. If you have, for example, read and process the .xlsx file from 1000 to 1000 lines, or some other solution.

    
asked by anonymous 31.07.2014 / 16:56

1 answer

3

1 Read the complete file:

Increase the heap size of the JVM:

java Xmx80m seuprograma //80 megabytes (só para mostrar a sintaxe)
java -Xmx6g seuprograma // 6 gigabytes

Note that the JVM is limited to approximately 2GB on computers with 32bit architecture

2 Wide Input and Partial Read

In order to deal with a wide and low memory entry the best option I know is the apache SXSSF .

SXSSFWorkbook wb = new SXSSFWorkbook(-1); // cancela o "flushing" automatico e leitura do arquivo completo
// Depois faça o controle manual:
if(rownum % NOR == 0) {
((SXSSFSheet)sh).flushRows(NOR); // mantém NOR linhas e "descarta" o restante
//...

For more details I suggest that you see and understand the example in the link posted above.

    
31.07.2014 / 18:42