I need to retrieve information from a page. How can I continue what I started?

0
<?php 

header('Content-Type: text/html; charset=utf-8');

$ch = curl_init();
$timeout = 0;
curl_setopt($ch, CURLOPT_URL, 'http://www.cidades.ibge.gov.br/xtras/uf.php?lang=&coduf=17&search=tocantins');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$conteudo = curl_exec ($ch);
curl_close($ch);

highlight_string($conteudo);

?>

All page content that I'm retrieving is within $ content . Within this <ul> will have tens or hundreds of results and I need to get all.

<ul id="lista_municipios">
    <li id="">
        <a href="perfil.php?lang=&codmun=170025&search=tocantins|item1">item2</a>
    </li>
    <li>....
    <li>....
</ul>

I need to get item1 and item2 .

    
asked by anonymous 16.02.2015 / 17:11

1 answer

1

Here is an example using phpQuery-one-file for CEP consultation; The part of cURL is not included because the focus is the use of phpQuery; This is one of several possible solutions.

  

phpQuery: link

$body = $client->send($request)->getBody(); //Aqui seria seu HTML
    //Inclusão do phpQuery
    if (!method_exists('phpQuery', 'newDocumentHTML'))
        require_once __DIR__ . DIRECTORY_SEPARATOR . 'phpQuery-onefile.php';
    //Inicialização do documento, substitua $body pela sua variável contendo o HTML;
    $doc = \phpQuery::newDocumentHTML($body, $charset = 'utf-8');
    $resultados = [];
        //Itera sobre as linhas da tabela;
        foreach(\phpQuery::pq('table[cellpadding="5"]')->find('tr') as $linha) {
            $dados = [];
            foreach(\phpQuery::pq($linha)->find('td') as $coluna) {
                $valor = htmlspecialchars_decode(trim(preg_replace('/\s+/', ' ', \phpQuery::pq($coluna)->html())));
                $dados[] = $valor;
            }
            $dadosFinal['logradouro'] = $dados[0];
            $dadosFinal['bairro'] = $dados[1];
            $dadosFinal['localidade'] = $dados[2];
            $dadosFinal['uf'] = $dados[3];
            $dadosFinal['cep'] = $dados[4];
            $resultados[] = $dadosFinal;
        }
return $resultados;

Applying to your need, you would do something like:

//Inclusão do phpQuery
if (!method_exists('phpQuery', 'newDocumentHTML'))
    require_once __DIR__ . DIRECTORY_SEPARATOR . 'phpQuery-onefile.php';
//Inicialização do documento, substitua $body pela sua variável contendo o HTML;
$doc = \phpQuery::newDocumentHTML($body, $charset = 'utf-8');

foreach(\phpQuery::pq('ul#lista_municipios')->find('li') as $linha) {
    $valor = htmlspecialchars_decode(\phpQuery::pq($linha)->html());//item2
    $valorAttr = htmlspecialchars_decode(\phpQuery::pq($linha)->attr('href')); //Item1 (valor do href)
    $item1 = explode('|', $valorAttr)[1]; //mantive $valorAttr caso você precise. 
}

In the end, I would need tests and adaptations for your needs;

    
16.02.2015 / 17:28