Problems with file_get_contents and DOMDocument

0

I'm trying to download content from a website, but it's giving this warning:

  

'DOMDocument :: loadHTML (): Unexpected end tag: tr in Entity

And it is indicating several lines. I can not fix the accent either.

Could anyone help me understand and solve these problems?

$content = http_build_query([
    'Local' => 'Adamantina',
    'Inicio' => '01/01/2015',
    'Final' => '31/12/2015',
]);

$context = stream_context_create([
    'http' => [
        'method'  => 'POST',
        'header'  => 'Content-type: application/x-www-form-urlencoded',
        'content' => $content,
    ]
]);

$contents = utf8_decode(file_get_contents('http://www.ciiagro.sp.gov.br/ciiagroonline/Listagens/BH/LBalancoHidricoLocal.asp', false, $context));

$dom = new DOMDocument();
$dom->loadHTML($contents);
$dom->saveHTML($dom->documentElement);

$xpath = new DomXPath($dom);
$rows = $xpath->query('//table/tr[position()>0]');

foreach ($rows as $row) {
    $tds=$row->getElementsByTagName("td");   

    foreach ($tds as $td) {
        print($td->nodeValue);
        echo "<br>";
    }
}
    
asked by anonymous 21.01.2016 / 13:04

1 answer

1

Warning problem:

To fix the Warning problem you should use libxml_use_internal_errors() , in fact it will only hide the errors of libxml .

  

Use the following:

libxml_use_internal_errors(true);

Source: link

Accent problem:

To correct the coding problem use mb_convert_encoding() , this will convert to HTML, but remove the utf8_decode() previous!

  

Use the following:

mb_convert_encoding($td->nodeValue, 'HTML-ENTITIES', 'UTF-8');

Source: link

  

Change to something similar to this:

$contents = file_get_contents('http://www.ciiagro.sp.gov.br/ciiagroonline/Listagens/BH/LBalancoHidricoLocal.asp', false, $context);

Removing utf8_decode() .

Note:

  • The easiest way to know when to use HTML-ENTITIES, for me, is to know that it has ? instead of <?> (with black background) or some combinations of "random" characters. Of course, this is just for me, I'm kicking myself to work.

  • I think it's best to use cURL instead of file_get_contents() ,

  • I'm out of time, sorry, I'll try to improve the answer soon.

        
    21.01.2016 / 13:30