I'm looking for a solution that consists of an automatic way of reading PDF's, basically I get hundreds of invoices and wanted a way to automate. What I've tried:
Programs that converts to txt, which is not as effective because it messes up some values
Programs that take the coordinate, sometimes changes from x, y coordinate, for example, usually a PDF excerpt has a line, but sometimes when you have two, zoa layout.
I'm trying to find some pattern, maybe as an ID, I've read this documentation link wanted to see if I would get the dictionary, from suddenly the value I want, on all invoices would be with the same dictionary. Anyone have any idea of any library that I could rescue dictionary and text? I think the most complete library is PDFParser - pdfparser.org that supports more encoding and the maximum that is supported is to extract metadata