Reading .NET PDF

2

I've been researching for a long time a way to read a pdf document that contains Sinapi Inputs tables and save the data in my database, and I do not do the minimum of how to do it, could anyone give a tip? >

pdf link here

more complex pdf link here

    
asked by anonymous 16.05.2015 / 04:14

1 answer

3

Until it is possible to read, however, it is only feasible if the PDF maintains a "clean" format (with well defined rows and columns, without multiline, etc). Even though a change in the layout may break all the code made for reading the PDF.

In most cases a viable solution would be to transform the PDF into another format: HTML, TXT, Xls, etc.

Here has a good online tool for PDF to HTML conversion that would greatly facilitate reading in several language (including C #). Here's an example of what your document looks like:

Document converted to HTML:

Becausethedocumentdoesnothavetableswithadefaultset,conversionmakesHTMLtrickytoread,forexamplewith HtmlAgilityPack

One of the tools to convert PDF into a "readable" format for a programming language is Able2Extract

See the settings and how your document was converted to XLS:
  

This is the best conversion option because it allows you to align / select   only the required text


Configuration: Selection of table and columns for conversion only

AfreetooltoextractPDFdata: PDF Multitool utility

Convertedtable,nowjustcreatecodetoreadXLS

  

CertainlythecodetoreadXLSismuchmorepracticalthanforPDF  

string con = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\temp\test.xls;Extended Properties='Excel 8.0;HDR=Yes;'"
using(OleDbConnection connection = new OleDbConnection(con))
{
    connection.Open();
    OleDbCommand command = new OleDbCommand("select * from [Sheet1$]", connection) 
    using(OleDbDataReader dr = command.ExecuteReader())
    {
         while(dr.Read())
         {
             var row1Col0 = dr[0];
             Console.WriteLine(row1Col0);
         }
    }
}

Some of the various examples available on the WEB: > Here and Here

    
16.05.2015 / 19:59