Regular Expressions

3

I have a little trouble putting together regular expressions, I'm trying to work with this code:

<?php 
$url = file_get_contents('http://ciagri.iea.sp.gov.br/precosdiarios/');
preg_match_all($expressao, $url, $conteudo);
echo $conteudo; 
?>

I need to get prices between these codes:

<tr style="background-color:White;">
    <td style="width:170px;">
        Mandioca para mesa
    </td>
    <td style="width:120px;">
        Mogi Mirim
    </td>
    <td align="right" style="width:70px;">
        11,50
    </td>
    <td align="center" style="width:70px;">
        cx.23 kg
    </td>
    <td style="width:200px;">
        <div id="ctl00_ContentPlaceHolder1_gridRecebidos_ctl95_PanelGridObs">
        </div>
    </td>
</tr>
<tr>
    <td style="width:170px;">
        Mandioca para mesa
    </td>
    <td style="width:120px;">
        Pindamonhangaba
    </td>
    <td align="right" style="width:70px;">
        28,00
    </td>
    <td align="center" style="width:70px;">
        cx.23 kg
    </td>
    <td style="width:200px;">
        <div id="ctl00_ContentPlaceHolder1_gridRecebidos_ctl96_PanelGridObs">
        </div>
    </td>
</tr>
<tr style="background-color:White;">
    <td style="width:170px;">
        Mandioca para mesa
    </td>
    <td style="width:120px;">
        Sorocaba
    </td>
    <td align="right" style="width:70px;">
        8,79
    </td>
    <td align="center" style="width:70px;">
        cx.23 kg
    </td>
    <td style="width:200px;">
        <div id="ctl00_ContentPlaceHolder1_gridRecebidos_ctl97_PanelGridObs">
        </div>
    </td>
</tr>

To take the price of each city:

-What would be the best standard to use?

    
asked by anonymous 13.01.2015 / 17:48

2 answers

5

The ideal is to use XPATH to get these prices. Looking at this page you reported would look like this:

$dom = new DomDocument;
$dom->loadHTMLFile("http://ciagri.iea.sp.gov.br/precosdiarios/");

$xpath = new DomXPath($dom);
// essa query pega o todos os TDs na posicao 3 da primeira tabela com a classe "tabela_dados"
$nodes = $xpath->query("(//table[@class='tabela_dados'])[1]/tr/td[position()=3]");

foreach ($nodes as $i => $node) {
    echo $node->nodeValue . "\n"; // vai imprimir todos os preços
}
    
13.01.2015 / 18:15
4

I was able to do with this regex:

<tr[^>]*>\s*<td[^>]*>[^<]*<\/td>\s*<td[^>]*>[^<]*<\/td>\s*<td[^>]*>\s*(\S*)

It's important that you capture all the matches that result.

How does this expression work?

Let's break them down into parts:

  • <tr[^>]*> - Start with <tr , then use [^>]> to skip everything until you find a > and consume > . That is, this consumes <tr blablabla> . It also works if there is only <tr> .
  • \s* - Consumes a lot of whitespace and line breaks.
  • <td[^>]*>[^<]*<\/td>\s* - Start with <td , then use [^>]> to skip everything until you find > and consume > . It continues to consume until you find another < and then consumes the </td> and the following blank spaces and line breaks. That is, it consumes the first <td blabla>blablabla</td> .
  • The same thing as item 3 will consume the second <td blabla>blablabla</td> .
  • <td[^>]*>\s* - Consumes the following <td blabla> and whitespace and line breaks. Right after that, we have the price.
  • (\S*) - Captures all of the following characters until you find a blank space (and does not consume white space). That is, it will capture the price.
  • Tested here . To check, put the regex in the first field and g in the second. In the area below place the text where you want to do the search (in the case of HTML).

        
    13.01.2015 / 18:13