Help with PHP Replication Structure - PARSER - simple_html_dom.php

4

I am making a parser with simple_html_dom.php where I pull all the links of a certain page. I can pull the links and assign to an array, then the problem comes up:

  • This page has a maximum display limit of 36 items per page.
  • items increase and decrease sporadically ...

Situation example:

If the source I am pulling the parser has 133 items, due to the limitation of 36 items per page, I will have to do the 4x parser by changing the page number in the URL so that the check is done until the 133 total items are pulled.

What I need:

Take the 133 items without having to specify a static limit for the counter, because as items increase and decrease, this limit has to be dynamic and automatic.

What I've done:

require ("simple_html_dom.php");

//define o limite de tempo do script como 0
set_time_limit(0);

//variavel que conta o total de links encontrados
$nlinks = 0;

//string que pega o valor atraves do parser
$string = '';

//array que pega o valor do parser
$toyota =array();

//contadores
$cont = 0;
$x=1;

/****************
    PRECISO QUE O A REPETIÇÃO ABAIXO (WHILE) SEJA REALIZADA ATÉ QUE 
    O ARRAY (TOYOTA) SEJA PREENCHIDO COM O TOTAL DE LINKS ENCONTRADOS
    SEM EU TER QUE ESPECIFICAR UM LIMITE ESTÁTICO PARA O CONTADOR...
    ISSO PRECISA SER DINÂMICO E AUTOMÁTICO, NO CASO ABAIXO COLOQUEI 4 ESTÁTICO
*****************/

//enquanto o contador for menor que 4 entra no laço
while($cont < 4){   

// get DOM from URL or file
$html = file_get_html('http://www.webmotors.com.br/comprar/carros/novos-usados/'
.'sp-sao-paulo/toyota/?tipoveiculo=carros&tipoanuncio=novos-usados&anunciante=pessoa'
.'%20f%C3%ADsica&marca=toyota&vehicle1=%7B%22marca%22:%22toyota%22%7D&location=%5B%7B'
.'%22state%22:%22s%C3%A3o%20paulo%22,%22abbr%22:%22sp%22%7D%5D&precoate=170000&anode'
.'=2012&kmate=30000&atributos=%C3%9Anico%20dono&p='.$cont."&o=3&qt=36");

        //para cada link encontrado...
        foreach($html->find('a') as $e){
        $string = (string) $e->href;    

            //apenas verifica se o link nao tem a string "comprar/toyota"
            if(strpos($string, 'comprar/toyota') != 1){
                unset($html);
            }else{
                //verifica se o link tem a string "comprar/toyota"
                if(strpos($string, 'comprar/toyota') == 1){ 
                    //transforma a string encontrada em um link ativo
                    $link = "<a href='http://www.webmotors.com.br/".$string. "'>".$string. "</a>";

                    //echo $link."<br>";

                    unset($html);
                    $nlinks++;

                    //insere o link no array
                    $toyota[$nlinks] = $link;
                }                           
            }                       
        }
    $cont++;
    }

    //pega o tamanho do array
    $tam = sizeof($toyota);

    //enquanto o contador for menor que o tamanho do array
    while($x <= $tam){  
        //imprime o array na posição x
        echo $toyota[$x]."<br>";
        $x++;
    }

    echo "<br> ".$nlinks." carros da TOYOTA foram encontrados!<br>";

The output:
Anyone who can help is a help .....

    
asked by anonymous 19.05.2015 / 15:39

1 answer

3

Do this:

$html = true;

while($html){   

// get DOM from URL or file
$html = file_get_html('http://www.webmotors.com.br/comprar/carros/novos-usados/'
.'sp-sao-paulo/toyota/?tipoveiculo=carros&tipoanuncio=novos-usados&anunciante=pessoa'
.'%20f%C3%ADsica&marca=toyota&vehicle1=%7B%22marca%22:%22toyota%22%7D&location=%5B%7B'
.'%22state%22:%22s%C3%A3o%20paulo%22,%22abbr%22:%22sp%22%7D%5D&precoate=170000&anode'
.'=2012&kmate=30000&atributos=%C3%9Anico%20dono&p='.$cont."&o=3&qt=36");

Why ?: Because when file_get_html does not get the page you want, a false is returned.

Note: this false should be handled by the application because it will usually fatal_error example of treatment:

if($html){
//para cada link encontrado...
        foreach($html->find('a') as $e){
        } 
}
    
19.05.2015 / 15:49