Import HTML table based on a URL and fill in a datatable

0
string htmlCode = "";

    using (WebClient client = new WebClient())
    {
        client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
        htmlCode = client.DownloadString("http://www.site.html");
    }

 HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
 doc.LoadHtml(htmlCode);
 var headers = doc.DocumentNode.SelectNodes("//tr/th");
 DataTable table = new DataTable();
 foreach (HtmlNode header in headers)
 table.Columns.Add(header.InnerText); 
 foreach (var row in doc.DocumentNode.SelectNodes("//tr[td]")) 
 table.Rows.Add(row.SelectNodes("td").Select(td => td.InnerText).ToArray());

This example was posted on another topic in SOen. I do not know if the data table is already filled here and I do not know the names of the table fields.

    
asked by anonymous 20.09.2016 / 15:50

1 answer

1

I did not quite understand your problem, but I'll modify the code a bit to try to explain it better.

First, the link you posted in comments contains more than one table, so let's get the table you want by id .

Your final code will look like this:

           string htmlCode = "";

            using (WebClient client = new WebClient())
            {
                client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
                htmlCode = client.DownloadString("http://www.codiceinverso.it/directory-cognomi/cadore.html");
            }

            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
            doc.LoadHtml(htmlCode);
            DataTable table = new DataTable();

            //Seleciona todas as colunas
            var cabecalhos = doc.DocumentNode.SelectNodes("//table[@id='cognomi']/thead/tr/th");
            foreach (HtmlNode col in cabecalhos)
            {
                //Adiciona as colunas
                table.Columns.Add(col.InnerText);
            }

            //Seleciona todas as linhas
            var linhas = doc.DocumentNode.SelectNodes("//table[@id='cognomi']/tbody/tr[td]");
            foreach (var row in linhas)
            {
                //Adiciona todas as linhas
                table.Rows.Add(row.SelectNodes("td").Select(td => td.InnerText).ToArray());
            }

The DataTatble generated will have 2 columns and 10 rows, as can be seen in the images below:

Columns:

Lines:

    
20.09.2016 / 18:04