I'm using Sun in PHP to get the link from a tag, where through "getAttribute" I can get such a link by the href attribute.
Crawler script:
<?php
//carregamento da url
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTMLFile("http://www.linkdosite.com.br");
//pega somente os links
$links = $dom->getElementsByTagName('a');
//array que armazena o valor do crawler
$getLink = array();
$nlinks = 0;
foreach ($links as $pegalink) {
//aqui pega cada link
$link = $pegalink->getAttribute('href');
$termo = 'detalhe';//Termo para diferenciar dos demias links e pegar apenas os que contenham o termo
$pattern = '/' . $termo . '/';//Padrão a ser encontrado na string $link
if (preg_match($pattern, $link)) {
$getLink[$nlinks] = $link;//Atribui o link ao array $getLink
echo $getLink[$nlinks]."<br>";//Imprime o link na tela
$nlinks++;
}
}
Now, I also need to get the string that is inside the 'a' tag, I did not find any examples to help me solve this.
Block that I get via crawler:
<a href="link">
<font style="font-size: 14px;" color="black" face="arial"><b>String que eu quero pegar</b></font>
</a>