How do I get the values inside several tags?

0

I have the following html page:

<!DOCTYPE html>
<html>

    <head>
        <title>Exemplo</title>
    </head>
    <body>
        <div id="text">Valor 1</div>
        <div id="text">Valor 2</div>
        <div id="text">Valor 3</div>
    </body>

</html>

I'm using the following function in PHP to get the text between a tag:

    function capturar($string, $start, $end) {
    $str = explode($start, $string);
    $str = explode($end, $str[1]);
    return $str[0];
}

Example usage:

 <?php
$url = file_get_contents('http://localhost/exemplo.html');
$valor = capturar($url, '<div id="text">', '</div>');
echo $valor;

However, when you have more than one identical tag with different text between them, it only takes the text between the first tag.

What would I do to get all the texts between this (<div id="text">, </div>) tag?

    
asked by anonymous 19.11.2017 / 04:06

1 answer

3

PHP already has native functions for handling HTML. I do not believe using REGEX for this purpose is recommended.

First you get HTML, using file_get_contents or cURL , as you are using file_get_contents , I'll leave it like this:

$html = file_get_contents('http://localhost/exemplo.html');

Then, assuming there was no error retrieving the content, create a DOM and an XPath of that content, so we can manipulate it:

$DOM = new DOMDocument;
$DOM->loadHTML($html);
$XPath = new DomXPath($DOM);

Now, just search for what we want, using XPath:

$divs = $XPath->query('//div[@id="text"]');

If this is found, we can loop it. Already to display the content we use nodeValue :

foreach($divs as $div){
    echo $div->nodeValue;
    echo '<br>';
}

At the end you will have:

$html = file_get_contents('http://localhost/exemplo.html');

$DOM = new DOMDocument;
$DOM->loadHTML($html);
$XPath = new DomXPath($DOM);

$divs = $XPath->query('//div[@id="text"]');

foreach($divs as $div){
    echo $div->nodeValue;
    echo '<br>';
}

Result:

Valor 1
Valor 2
Valor 3

Additionally, you should not repeat a same id . The ids must be unique, having more than one element with the name of texto is incorrect.

    
20.11.2017 / 19:09