Is there any way in java to convert a PDF extension file to the TXT extension?
Is there any way in java to convert a PDF extension file to the TXT extension?
You can try using the iText library, which has some ready-to-extract functionality for PDF files. One way to do this would be:
public void parsePdf(String pdf, String txt) throws IOException {
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
PrintWriter out = new PrintWriter(new FileOutputStream(txt));
TextExtractionStrategy strategy;
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
out.println(strategy.getResultantText());
}
out.flush();
out.close();
reader.close();
}
Where the pdf
parameter is the PDF file to extract the text and the txt
parameter is the destination TXT file.
This code snippet was taken from a ready-made example created by the iText developer. This example, as well as the resulting TXT, can be found in this link .