Get data from another site through ClassName

0

How can I search for data in another site's HTML?

The html part of the other site is this:

<p class=MsoNormal align=center style='text-align:center'>
  <span style='font-size:10.0pt;font-family:"Arial","sans-serif";mso-fareast-font-family: "Times New Roman";color:black'>COTAÇÕES</span>
  <span style='mso-fareast-font-family: "Times New Roman"'><o:p></o:p></span>
</p>

In my site code I tried to do this:

 $html = new DOMDocument();
 $html->loadHTMLFile('http://www.agropan.coop.br/cotac.htm');

 echo $html->getElementByClassName('MsoNormal').getAttribute("p");

In fact, I'd like to search only the "COTS" content since  that it is searched by ClassName , how should I do it?

    
asked by anonymous 28.04.2017 / 00:04

2 answers

1
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <?php
    error_reporting(E_ALL & ~ E_NOTICE);
    $html = file_get_contents("http://www.agropan.coop.br/cotac.htm");

    $DOM = new DOMDocument();
    libxml_use_internal_errors(true);
    $DOM->loadHTML($html);
    libxml_clear_errors();
    $finder = new DomXPath($DOM);
    $classname = 'MsoNormal';
    $nodes = $finder->query("//*[contains(@class, '$classname')]");
    foreach ($nodes as $node) {
      $result=$result.$node->nodeValue."***";
    }

    $result = preg_replace(array("/\t/", "/\s{2,}/", "/\n/"), array("", " ", " "), $result);
    $partes = explode('***',$result);
    $cotacoes=$partes[0];
    $cotacoes = trim(preg_replace('/[\r\n]+/', '', $cotacoes));
    $cotacoes = str_replace("COTAÇÕES", "", $cotacoes);

    echo $cotacoes;     

    ?>
  

Other ways to avoid errors due to invalid entities "Tag o: p invalid in Entity".

1: Replacing these invalid entities:

    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <?php
    error_reporting(E_ALL & ~ E_NOTICE);
    $html = file_get_contents("http://www.agropan.coop.br/cotac.htm");

    $search = array("<o:p>", "</o:p>");
    $replace = array("", "","<div>");
    $html = str_replace($search, $replace, $html);

    $DOM = new DOMDocument();
    $DOM->loadHTML($html);
    $finder = new DomXPath($DOM);
    $classname = 'MsoNormal';
    $nodes = $finder->query("//*[contains(@class, '$classname')]");
    foreach ($nodes as $node) {
      $result=$result.$node->nodeValue."***";
    }

    $result = preg_replace(array("/\t/", "/\s{2,}/", "/\n/"), array("", " ", " "), $result);
    $partes = explode('***',$result);
    $cotacoes=$partes[0];
    $cotacoes = trim(preg_replace('/[\r\n]+/', '', $cotacoes));
    $cotacoes = str_replace("COTAÇÕES", "", $cotacoes);

    echo $cotacoes;     

    ?>

2: Using a @ in $ DOM-> loadHTML ($ html); '@ $ DOM-> loadHTML ($ html);

    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <?php
    error_reporting(E_ALL & ~ E_NOTICE);
    $html = file_get_contents("http://www.agropan.coop.br/cotac.htm");

    $DOM = new DOMDocument();
    @$DOM->loadHTML($html);
    $finder = new DomXPath($DOM);
    $classname = 'MsoNormal';
    $nodes = $finder->query("//*[contains(@class, '$classname')]");
    foreach ($nodes as $node) {
      $result=$result.$node->nodeValue."***";
    }

    $result = preg_replace(array("/\t/", "/\s{2,}/", "/\n/"), array("", " ", " "), $result);
    $partes = explode('***',$result);
    $cotacoes=$partes[0];
    $cotacoes = trim(preg_replace('/[\r\n]+/', '', $cotacoes));
    $cotacoes = str_replace("COTAÇÕES", "", $cotacoes);

    echo $cotacoes;     

    ?>
  

By id just replace

$classname = 'MsoNormal';
$nodes = $finder->query("//*[contains(@class, '$classname')]");

by

$id = 'MsoNormal';
$nodes = $finder->query("//*[contains(@id, '$id')]");
    
28.04.2017 / 04:15
0

In php you need to make some changes in php.ini so that it allows file_get_contents to read urls, file_get_contents is meant to read files, and by chance it also accesses Urls, for this reason I believe it is better to use cUrl since this has this purpose and also more support for! But feel free to use it.

function file_get_contents_curl($url) {
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);       

    $data = curl_exec($ch);
    curl_close($ch);

    return $data;
}

Get the address code you want, and with regular expression, capture the class node you need, I believe it's the best way to do this!

See below an example expression for this ...

^<[a-z]\s[a-z]+\=[MsoNormal]+\salign=center\sstyle=\'text-align\:center\'\>\s+(.*)\s

Maybe this will solve!

    
28.04.2017 / 16:23