How to convert a PDF file to TXT for TXT?

6

Is there any way in java to convert a PDF extension file to the TXT extension?

    
asked by anonymous 16.11.2015 / 15:15

1 answer

5

You can try using the iText library, which has some ready-to-extract functionality for PDF files. One way to do this would be:

public void parsePdf(String pdf, String txt) throws IOException {
    PdfReader reader = new PdfReader(pdf);
    PdfReaderContentParser parser = new PdfReaderContentParser(reader);
    PrintWriter out = new PrintWriter(new FileOutputStream(txt));
    TextExtractionStrategy strategy;
    for (int i = 1; i <= reader.getNumberOfPages(); i++) {
        strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
        out.println(strategy.getResultantText());
    }
    out.flush();
    out.close();
    reader.close();
}

Where the pdf parameter is the PDF file to extract the text and the txt parameter is the destination TXT file.

This code snippet was taken from a ready-made example created by the iText developer. This example, as well as the resulting TXT, can be found in this link .

    
16.11.2015 / 23:17