I've been researching for a long time a way to read a pdf document that contains Sinapi Inputs tables and save the data in my database, and I do not do the minimum of how to do it, could anyone give a tip? >
pdf link here
more complex pdf link here
Until it is possible to read, however, it is only feasible if the PDF maintains a "clean" format (with well defined rows and columns, without multiline, etc). Even though a change in the layout may break all the code made for reading the PDF.
In most cases a viable solution would be to transform the PDF into another format: HTML, TXT, Xls, etc.
Here has a good online tool for PDF to HTML conversion that would greatly facilitate reading in several language (including C #). Here's an example of what your document looks like:
Document converted to HTML:
Becausethedocumentdoesnothavetableswithadefaultset,conversionmakesHTMLtrickytoread,forexamplewith HtmlAgilityPack
One of the tools to convert PDF into a "readable" format for a programming language is Able2Extract
See the settings and how your document was converted to XLS:This is the best conversion option because it allows you to align / select only the required text
Configuration: Selection of table and columns for conversion only
AfreetooltoextractPDFdata: PDF Multitool utility
Convertedtable,nowjustcreatecodetoreadXLS
CertainlythecodetoreadXLSismuchmorepracticalthanforPDF
string con = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\temp\test.xls;Extended Properties='Excel 8.0;HDR=Yes;'"
using(OleDbConnection connection = new OleDbConnection(con))
{
connection.Open();
OleDbCommand command = new OleDbCommand("select * from [Sheet1$]", connection)
using(OleDbDataReader dr = command.ExecuteReader())
{
while(dr.Read())
{
var row1Col0 = dr[0];
Console.WriteLine(row1Col0);
}
}
}
Some of the various examples available on the WEB: > Here and Here