How do I get the a tag inside a DIV using XPATH?

2

I'm trying to get the data from a DIV that contains the following structure:

<div class="item" style="height:273px">
<a href="/arapiraca/anuncios/detalhes/159695-honda-cg-150-2008">
    <img alt="" src="/i.php?w=148&h=119&f=3,0|4,0&src=uploads/anunciosfotos/2014/04/858a6126588bdace8bc0f144f900d097.jpg"></img>
    <img src="/img/icone-novo.png" alt="" style="position: absolute; z-index: 20; width: 60px; height: 60px; right: -5px; top: -10px; border: 0"></img>
    <strong class="nome" style="font-weight:normal">
        HONDA CG 150 2008 TITAN - KS GASOLINA
    </strong>
    <strong class="valor">
        R$ 4.500,00
    </strong>
    <span class="vendedor">
        <span>
            <img alt="" src="/i.php?w=148&h=60&src=uploads/clientes/2659aa7030bac6f245852b948187188a.jpg"></img>
        </span>
    </span>
</a>
<input class="comparacao" type="checkbox" name="comparacao[159695]" value="159695"></input>

$dom = new DOMDocument();
@$dom->loadHTML($content);

$xpath = new DOMXPath($dom);
$classname = "item";
$nodes = $xpath->query("//*[@class='" . $classname . "']");

foreach ($nodes as $node) {
     echo $node->nodeValue . " <br> ";
}

With the code above, I can only get the following result:

HONDA CG 150 2008 TITAN - KS GASOLINA R$ 4.500,00 

I also need to get the tags .

    
asked by anonymous 23.04.2014 / 16:32

2 answers

1

The XPath expression you are using returns all elements that have class attribute with item value:

//*[@class='item']

It's a collection. Your code navigates through the items in this collection, one of which is the div you are showing.

If you print the value of this expression as a string ( nodeValue ), it only returns the text content of the tags it contains. But you can use more elaborate, absolute XPath expressions to get exactly what you want.

To get the element a that is within that div you just need to add one more step:

//*[@class='item']/a

In the above case, XPath is returning an element . If you want the content of the attribute href of the a element, then add another step containing @href or attribute::href :

//*[@class='item']/a/@href

I was not sure if you wanted to extract the text inside <a> . If it is the case (extracting content in text format from <strong class='nome'> ), you can do this directly in XPath using:

//*[@class='item']//*[@class='nome']/text()

The function text() returns the result of the expression not as an XML node, but directly as a string. This will affect how you use the data (you can read the string, but you will not be able to read the attributes of the element that contains it, for example - you can not use attribute or nodeValue ).     

02.05.2014 / 15:29
1

You can add a new query with only //a/@href , or change the query to return two sets of nodes, using the '|' operator

//*[@class='item'] | //a/@href

Then you have to set the cycle foreach , eventually.

Good work!

    
25.04.2014 / 19:08