How to extract specific data from an html file with php?

1

I'd like to know how I can extract some of the contents of a HTML file. This file has dozens of emails and names and would like to extract this data. Can anyone help me do this?

<div class="tcell tquick">
  <div style="background-color: #ddd; padding: 4px;"> 
      <span> <b class="the_nome">Marcos Vinícius Nascimento Pereira;</b> </span> 
  </div>
  <br>
  <div> </div>
  <div>
    <div class="c the_email">[email protected]</div>
  </div>
  <div> </div>
</div>

In this case I would like to extract the nome and the email with PHP .

    
asked by anonymous 07.08.2015 / 03:55

2 answers

5

It's only fair for this type of task that PHP supports XPath . Suppose you have your HTML output that way and that it under a URL like localhost / emails.html:

<!DOCTYPE html>
<html>
<head></head>
<body>
    <div class="tcell tquick">
        <div style="background-color: #ddd; padding: 4px;">
            <span> <b class="the_nome">Ciclano;</b> </span>
        </div>
        <br>
        <div> </div>
        <div>
            <div class="c the_email">[email protected]</div>
        </div>
        <div> </div>
    </div>
    <div class="tcell tquick">
        <div style="background-color: #ddd; padding: 4px;">
            <span> <b class="the_nome">Fulano;</b> </span>
        </div>
        <br>
        <div> </div>
        <div>
            <div class="c the_email">[email protected]</div>
        </div>
        <div> </div>
    </div>
</body>
</html>

Then you could load this content into string with DOMDocument and use another class called DOMXPath :

<?php 

$html_content = file_get_contents('http://localhost/emails.html');

$dom = @DOMDocument::loadHTML($html_content);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//div[@class="tcell tquick"]');    

foreach ($nodes as $node) {
    $nome  = $xpath->query('div/span/b[@class="the_nome"]', $node)->item(0);
    $email = $xpath->query('div/div[@class="c the_email"]', $node)->item(0);

    echo $nome->nodeValue  . PHP_EOL;
    echo $email->nodeValue . PHP_EOL;
}

This will do exactly what you need.

    
07.08.2015 / 04:58
1

You can do this by using regular expressions. I made a similar example, had to get the price of soy on the site of the rural channel.

    
08.08.2015 / 02:59