I'm doing parser for a website, I want to get some data from it, the data is structured as follows:
<div class="interesses">
<span class="tipo" >Tipo 1</span>
<span class="tipo" >Tipo 1</span>
<span class="tipo" >Tipo 2</span>
<span class="tipo" >Tipo 2</span>
<span class="tipo" >Tipo 3</span>
<span class="tipo" >Tipo 3</span>
</div>
I want to get information from span
tipo
, so I used the DOM:
$html = file_get_contents("http://exemplo.com");
$DOM = new DOMDocument();
$DOM->loadHTML('<meta charset="utf-8">'.$html);
$xpath = new DomXpath($DOM);
$tipo = $xpath->query('//*[contains(concat(" ", normalize-space(@class), " "), "tipo")]');
$arrValues = array_map(null,iterator_to_array($tipo))
foreach($arrValues as $value){
echo $value[0]->nodeValue."<br />";
}
It works!
But the problem is that on the source page, as you have seen, there are two "type 1" and two "type 2" and so on, the site always generates duplicate information, but I only want to show one of each, that is, only one "Type 1" and another "Type 2" and so on. But everything is coming and I have no idea what to do to prevent duplicity.
Update:
The% as% that @Miguel Angelo taught, worked! But now imagine the following scenario: There is 1 bakery online, which sells several types of sweet bread: with coconut and without coconut. The buyer then chooses two loaves one with coconut and another without coconut, the HTML structure would look something like this:
<div class="interesses">
<span class="tipo" >Pão Doce</span>
<span class="tipo" >Com coco</span>
<span class="tipo" >Pão Doce</span>
<span class="tipo" >Com coco</span>
<span class="tipo" >Pão Doce</span>
<span class="tipo" >Sem coco</span>
<span class="tipo" >Pão Doce</span>
<span class="tipo" >Sem coco</span>
</div>
I want to now show the user only the 2 types of bread he requested:
Item 1: Sweet bread with coconut, Item 2: Sweet bread without coconut.
The array_unique
would return something like:
Item 1: Sweet bread with coconut, Item 2: Sweet bread with coconut, item 3: Sweet bread without coconut, Item 4: Sweet bread without coconut
If you use DOM
of the tip of @Miguel Angelo, the "type" will only be repeated once, that is:
Item 1: Sweet Bread With Coconut, Item 2: No Coconut.
That is, if you have two of the same types of bread, it will only show or all or only 1, but I want you to display only one group each: "Sweet bread with coconut" and take the repetition "Sweet bread with coconut "but keep the" Sweet Coconutless Bread "and remove the duplicated" Sweet Coconutless Bread "again.
Is there any way to do this?