Get string inside tag a without attributes [duplicate]

0

I'm using Sun in PHP to get the link from a tag, where through "getAttribute" I can get such a link by the href attribute.

Crawler script:

<?php
//carregamento da url
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTMLFile("http://www.linkdosite.com.br");

//pega somente os links
$links = $dom->getElementsByTagName('a');

//array que armazena o valor do crawler
$getLink = array();

$nlinks = 0;

foreach ($links as $pegalink) {

    //aqui pega cada link 
    $link = $pegalink->getAttribute('href');

    $termo = 'detalhe';//Termo para diferenciar dos demias links e pegar apenas os que contenham o termo

    $pattern = '/' . $termo . '/';//Padrão a ser encontrado na string $link

    if (preg_match($pattern, $link)) {
        $getLink[$nlinks] = $link;//Atribui o link ao array $getLink 

        echo $getLink[$nlinks]."<br>";//Imprime o link na tela

        $nlinks++;
    } 

}

Now, I also need to get the string that is inside the 'a' tag, I did not find any examples to help me solve this.

Block that I get via crawler:

<a href="link">
  <font style="font-size: 14px;" color="black" face="arial"><b>String que eu quero pegar</b></font>
</a>
asked by anonymous 17.08.2017 / 21:50

3 answers

3

For you to retrieve the value of the attributes / retrieve the string inside a tag, do the following:

Example:

//carregamento da url
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTMLFile("http://google.com.br");

//pega somente os links
$links = $dom->getElementsByTagName('a');
$nlinks = 0;
foreach($links as $link) {
    // Recupera o texto dentro da tag
    echo $link->nodeValue, PHP_EOL;
    // Recupera o valor de um atributo
    echo $links->item($nlinks)->getAttribute('href'), PHP_EOL;
    $nlinks++;
}

In the PHP documentation, you have a contribution note with example .

    
18.08.2017 / 10:06
2

You can use strip_tags :

<?php
$text = '<a href="link">
  <font style="font-size: 14px;" color="black" face="arial"><b>String que eu quero pegar</b></font>
</a>
';
echo strip_tags($text);

See Ideone

There are many ways to achieve the result you are looking for, with DOMXpath (as said by @Lacobus and mentioned in the link that I sent you), you can do with DOM ... But this type of thing (scraping) is very specific because it depends on the structure of the target page ...

The most universal way would be as follows:

<?php
$str = file_get_contents("https://pt.stackoverflow.com/questions/229996/pegar-string-dentro-de-tag-a-sem-atributos");
$link = preg_match_all("/<a.*?>(.*?)<\/a\>/",$str, $matches);
print_r($matches[1]);
?>

If this does not work, put the site link to see the structure ...

    
17.08.2017 / 22:20
1

Use method evaluate() of class DOMXPath :

<?php

$html = "<a href=\"link\"><font style=\"font-size: 14px;\" color=\"black\" face=\"arial\"><b>String que eu quero pegar</b></font></a>";

$dom = new DOMDocument();

$dom->loadXML($html);

$xp = new DOMXPath($dom);

$str = $xp->evaluate("string(/a)");

echo $str;
    
17.08.2017 / 22:50