DOMXpath query with multiple classes

3

I'm performing a parse on a html file with the following structure:

<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Get_text 6</li>                                 
                <li class="txtArea emp-un-area">Get_text 7</li>
                <li class="txtToilet emp-un-bath">Get_text 8</li>
                <li class="txtCar emp-un-park">Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
        </div>
    </div>
</div>
<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Other Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Other Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Other Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Other Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Other Get_text 6</li>                                   
                <li class="txtArea emp-un-area">Other Get_text 7</li>
                <li class="txtToilet emp-un-bath">Other Get_text 8</li>
                <li class="txtCar emp-un-park">Other Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
        </div>
    </div>
</div>

Editing

The following block:

<div class="lstImv blackBd12"></div>

Then through this code:

<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8'); 
$dom->loadHTMLFile($html); 
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXPath($dom);
$content = $xpath->query('//div[@class="lstImv blackBd12"]');
foreach($content as $span)
{
    echo "<pre>";
        print_r($span);
    echo "</pre>";
}
?>

I get 2 objects with their values:

DOMElement Object
(
    [tagName] => div
    [schemaTypeInfo] => 
    [nodeName] => div
    [nodeValue] => 











        Get_text 1





                Get_text 2
                Get_text 3


                Get_text 4
                Get_text 5




            Get_text 6                                  
                Get_text 7
                Get_text 8
                Get_text 9


            Get_text 10



    [nodeType] => 1
    [parentNode] => (object value omitted)
    [childNodes] => (object value omitted)
    [firstChild] => (object value omitted)
    [lastChild] => (object value omitted)
    [previousSibling] => 
    [nextSibling] => (object value omitted)
    [attributes] => (object value omitted)
    [ownerDocument] => (object value omitted)
    [namespaceURI] => 
    [prefix] => 
    [localName] => div
    [baseURI] => 
    [textContent] => 











        Get_text 1





                Get_text 2
                Get_text 3


                Get_text 4
                Get_text 5




            Get_text 6                                  
                Get_text 7
                Get_text 8
                Get_text 9


            Get_text 10



)
DOMElement Object
(
    [tagName] => div
    [schemaTypeInfo] => 
    [nodeName] => div
    [nodeValue] => 











        Other Get_text 1





                Other Get_text 2
                Other Get_text 3


                Other Get_text 4
                Other Get_text 5




            Other Get_text 6                                
                Other Get_text 7
                Other Get_text 8
                Other Get_text 9


            Other Get_text 10



    [nodeType] => 1
    [parentNode] => (object value omitted)
    [childNodes] => (object value omitted)
    [firstChild] => (object value omitted)
    [lastChild] => (object value omitted)
    [previousSibling] => (object value omitted)
    [attributes] => (object value omitted)
    [ownerDocument] => (object value omitted)
    [namespaceURI] => 
    [prefix] => 
    [localName] => div
    [baseURI] => 
    [textContent] => 











        Other Get_text 1





                Other Get_text 2
                Other Get_text 3


                Other Get_text 4
                Other Get_text 5




            Other Get_text 6                                
                Other Get_text 7
                Other Get_text 8
                Other Get_text 9


            Other Get_text 10



)

So the way I'm doing:

<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8'); 
$dom->loadHTMLFile($html); 
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXPath($dom);
$content = $xpath->query('//strong[@class="imvFse emp-fase"]');
foreach($content as $span)
{
    echo "Key 1 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//strong[@class="emp-nome infNme colorTxt"]');
foreach($content as $span)
{
    echo "Key 2 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part1 infLoc"]');
foreach($content as $span)
{
    echo "Key 3 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part2 infLoc"]');
foreach($content as $span)
{
    echo "Key 4 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtBed emp-un-dorms"]');
foreach($content as $span)
{
    echo "Key 5 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtArea emp-un-area"]');
foreach($content as $span)
{
    echo "Key 6 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtCar emp-un-park"]');
foreach($content as $span)
{
    echo "Key 7 : ".$span->textContent."<br/>";
}
?>

I get the data this way:

Key 1 : Get_text 1
Key 1 : Other Get_text 1
Key 2 : 
Key 2 : 
Key 3 : Get_text 2
Key 3 : Other Get_text 2
Key 4 : Get_text 3
Key 4 : Other Get_text 3
Key 5 : Get_text 6
Key 5 : Other Get_text 6
Key 6 : Get_text 7
Key 6 : Other Get_text 7
Key 7 : Get_text 9
Key 7 : Other Get_text 9

In other words, it is iterating over textContents , except that I would like the keys to come in a sequential way (k1, k2, ..., k7, k1, k2, ..., k7) which is (k1, k1, k2, k2 ..., k7, k7).

    
asked by anonymous 06.12.2016 / 05:40

2 answers

1

Yes, in the method query accepted as argument expressions , you can for example use the conditional OR for the classes you want to get:

$content = $xpath->query('//strong[@class="imvFse emp-fase" OR @class="emp-nome infNme colorTxt"]');

After editing the question:

<?php 

$html = <<<HTML
<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Get_text 6</li>                                 
                <li class="txtArea emp-un-area">Get_text 7</li>
                <li class="txtToilet emp-un-bath">Get_text 8</li>
                <li class="txtCar emp-un-park">Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
        </div>
    </div>
</div>
<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Other Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Other Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Other Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Other Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Other Get_text 6</li>                                   
                <li class="txtArea emp-un-area">Other Get_text 7</li>
                <li class="txtToilet emp-un-bath">Other Get_text 8</li>
                <li class="txtCar emp-un-park">Other Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
        </div>
    </div>
</div>
HTML;

$dom = new domDocument('1.0', 'utf-8'); 
$dom->loadHTML($html); 
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXPath($dom);

$content = $xpath->query('//div[@class="lstImv blackBd12"]');

$return = [];


foreach($content as $nodeKey => $nodeValue) {

    $return[$nodeKey][1] = $xpath->query('//strong[@class="imvFse emp-fase"]', $nodeValue)->item($nodeKey)->nodeValue;
    $return[$nodeKey][2] = $xpath->query('//strong[@class="emp-nome infNme colorTxt"]', $nodeValue)->item($nodeKey)->nodeValue;
    $return[$nodeKey][3] = $xpath->query('//span[@class="emp-loc-part1 infLoc"]', $nodeValue)->item($nodeKey)->nodeValue;
    $return[$nodeKey][4] = $xpath->query('//span[@class="emp-loc-part2 infLoc"]', $nodeValue)->item($nodeKey)->nodeValue;
    $return[$nodeKey][5] = $xpath->query('//li[@class="txtBed emp-un-dorms"]', $nodeValue)->item($nodeKey)->nodeValue;
    $return[$nodeKey][6] = $xpath->query('//li[@class="txtArea emp-un-area"]', $nodeValue)->item($nodeKey)->nodeValue;
    $return[$nodeKey][7] = $xpath->query('//li[@class="txtCar emp-un-park"]', $nodeValue)->item($nodeKey)->nodeValue;
}

var_dump($return);
    
06.12.2016 / 15:22
2

Here is the solution I came up with:

<?php
$html = <<<HTML
<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Get_text 6</li>                                 
                <li class="txtArea emp-un-area">Get_text 7</li>
                <li class="txtToilet emp-un-bath">Get_text 8</li>
                <li class="txtCar emp-un-park">Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
        </div>
    </div>
</div>
<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Other Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Other Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Other Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Other Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Other Get_text 6</li>                                   
                <li class="txtArea emp-un-area">Other Get_text 7</li>
                <li class="txtToilet emp-un-bath">Other Get_text 8</li>
                <li class="txtCar emp-un-park">Other Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
        </div>
    </div>
</div>
HTML;

$dom = new domDocument('1.0', 'utf-8'); 
$dom->loadHTML($html); 
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXPath($dom);


$items = $xpath->query('//div[@class="lstImv blackBd12"]');
for($i = 0; $i < $items->length; $i++)
{
    $status = $xpath->query('//strong[@class="imvFse emp-fase"]');
    echo "Value     :".$status->item($i)->nodeValue."<br/>";    

    $titulo = $xpath->query('//span[@class="emp-loc-part1 infLoc"]');
    echo "Value     :".$titulo->item($i)->nodeValue."<br/>";

    $titulo2 = $xpath->query('//span[@class="emp-loc-part2 infLoc"]');
    echo "Value     :".$titulo2->item($i)->nodeValue."<br/>";   

    $valor = $xpath->query('//em[@class="emp-valor-apartir"]');
    echo "Value     :".$valor->item($i)->nodeValue."<br/>"; 

    $valor2 = $xpath->query('//strong[@class="emp-valor infVlr colorTxt"]');
    echo "Value     :".$valor2->item($i)->nodeValue."<br/>";

    $dorm = $xpath->query('//li[@class="txtBed emp-un-dorms"]');
    echo "Value     :".$dorm->item($i)->nodeValue."<br/>";

    $tam = $xpath->query('//li[@class="txtArea emp-un-area"]');
    echo "Value     :".$tam->item($i)->nodeValue."<br/>";   

}
?>

See the ideone

    
06.12.2016 / 23:21