I'm performing a parse on a html
file with the following structure:
<div class="lstImv blackBd12">
<div class="stCl3 stLeft imvImg">
<div class="imgBox">
<a class="emp-imgs-link">
<span class="imgFrm frmBig frmLeft">
<img class="emp-img-principal">
</span>
<span class="imgFrm frmMd frmTop">
<img class="emp-img-logo">
</span>
<span class="imgFrm frmMd frmBot">
<img class="emp-img-foto">
</span>
</a>
</div>
<strong class="imvFse emp-fase">Get_text 1</strong>
</div>
<div class="imvInf stCl3 stRight">
<div class="infHd">
<div class="hdLeft stCl2">
<strong class="emp-nome infNme colorTxt"></strong>
<span class="emp-loc-part1 infLoc">Get_text 2</span>
<span class="emp-loc-part2 infLoc">Get_text 3</span>
</div>
<div class="hdRight stCl1">
<em class="emp-valor-apartir" >Get_text 4</em>
<strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
</div>
</div>
<div class="infTxt">
<p class="blackTxt60 emp-descritivo"></p>
<ul>
<li class="txtBed emp-un-dorms">Get_text 6</li>
<li class="txtArea emp-un-area">Get_text 7</li>
<li class="txtToilet emp-un-bath">Get_text 8</li>
<li class="txtCar emp-un-park">Get_text 9</li>
</ul>
</div>
<div class="infBt">
<a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
</div>
</div>
</div>
<div class="lstImv blackBd12">
<div class="stCl3 stLeft imvImg">
<div class="imgBox">
<a class="emp-imgs-link">
<span class="imgFrm frmBig frmLeft">
<img class="emp-img-principal">
</span>
<span class="imgFrm frmMd frmTop">
<img class="emp-img-logo">
</span>
<span class="imgFrm frmMd frmBot">
<img class="emp-img-foto">
</span>
</a>
</div>
<strong class="imvFse emp-fase">Other Get_text 1</strong>
</div>
<div class="imvInf stCl3 stRight">
<div class="infHd">
<div class="hdLeft stCl2">
<strong class="emp-nome infNme colorTxt"></strong>
<span class="emp-loc-part1 infLoc">Other Get_text 2</span>
<span class="emp-loc-part2 infLoc">Other Get_text 3</span>
</div>
<div class="hdRight stCl1">
<em class="emp-valor-apartir" >Other Get_text 4</em>
<strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
</div>
</div>
<div class="infTxt">
<p class="blackTxt60 emp-descritivo"></p>
<ul>
<li class="txtBed emp-un-dorms">Other Get_text 6</li>
<li class="txtArea emp-un-area">Other Get_text 7</li>
<li class="txtToilet emp-un-bath">Other Get_text 8</li>
<li class="txtCar emp-un-park">Other Get_text 9</li>
</ul>
</div>
<div class="infBt">
<a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
</div>
</div>
</div>
Editing
The following block:
<div class="lstImv blackBd12"></div>
Then through this code:
<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8');
$dom->loadHTMLFile($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
$content = $xpath->query('//div[@class="lstImv blackBd12"]');
foreach($content as $span)
{
echo "<pre>";
print_r($span);
echo "</pre>";
}
?>
I get 2 objects with their values:
DOMElement Object
(
[tagName] => div
[schemaTypeInfo] =>
[nodeName] => div
[nodeValue] =>
Get_text 1
Get_text 2
Get_text 3
Get_text 4
Get_text 5
Get_text 6
Get_text 7
Get_text 8
Get_text 9
Get_text 10
[nodeType] => 1
[parentNode] => (object value omitted)
[childNodes] => (object value omitted)
[firstChild] => (object value omitted)
[lastChild] => (object value omitted)
[previousSibling] =>
[nextSibling] => (object value omitted)
[attributes] => (object value omitted)
[ownerDocument] => (object value omitted)
[namespaceURI] =>
[prefix] =>
[localName] => div
[baseURI] =>
[textContent] =>
Get_text 1
Get_text 2
Get_text 3
Get_text 4
Get_text 5
Get_text 6
Get_text 7
Get_text 8
Get_text 9
Get_text 10
)
DOMElement Object
(
[tagName] => div
[schemaTypeInfo] =>
[nodeName] => div
[nodeValue] =>
Other Get_text 1
Other Get_text 2
Other Get_text 3
Other Get_text 4
Other Get_text 5
Other Get_text 6
Other Get_text 7
Other Get_text 8
Other Get_text 9
Other Get_text 10
[nodeType] => 1
[parentNode] => (object value omitted)
[childNodes] => (object value omitted)
[firstChild] => (object value omitted)
[lastChild] => (object value omitted)
[previousSibling] => (object value omitted)
[attributes] => (object value omitted)
[ownerDocument] => (object value omitted)
[namespaceURI] =>
[prefix] =>
[localName] => div
[baseURI] =>
[textContent] =>
Other Get_text 1
Other Get_text 2
Other Get_text 3
Other Get_text 4
Other Get_text 5
Other Get_text 6
Other Get_text 7
Other Get_text 8
Other Get_text 9
Other Get_text 10
)
So the way I'm doing:
<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8');
$dom->loadHTMLFile($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
$content = $xpath->query('//strong[@class="imvFse emp-fase"]');
foreach($content as $span)
{
echo "Key 1 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//strong[@class="emp-nome infNme colorTxt"]');
foreach($content as $span)
{
echo "Key 2 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part1 infLoc"]');
foreach($content as $span)
{
echo "Key 3 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part2 infLoc"]');
foreach($content as $span)
{
echo "Key 4 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtBed emp-un-dorms"]');
foreach($content as $span)
{
echo "Key 5 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtArea emp-un-area"]');
foreach($content as $span)
{
echo "Key 6 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtCar emp-un-park"]');
foreach($content as $span)
{
echo "Key 7 : ".$span->textContent."<br/>";
}
?>
I get the data this way:
Key 1 : Get_text 1
Key 1 : Other Get_text 1
Key 2 :
Key 2 :
Key 3 : Get_text 2
Key 3 : Other Get_text 2
Key 4 : Get_text 3
Key 4 : Other Get_text 3
Key 5 : Get_text 6
Key 5 : Other Get_text 6
Key 6 : Get_text 7
Key 6 : Other Get_text 7
Key 7 : Get_text 9
Key 7 : Other Get_text 9
In other words, it is iterating over textContents
, except that I would like the keys to come in a sequential way (k1, k2, ..., k7, k1, k2, ..., k7) which is (k1, k1, k2, k2 ..., k7, k7).