How can I capture a favicon from a website via PHP?

5

I load an external content from a website and then import it into DOMDocument .

I can currently capture information from the title tag easily.

I do this:

$dom = new DOMDocument();

@$dom->loadHtml('<?xml encoding="UTF-8" ?>' . $conteudo_html)

$title = $dom->getElementsByTagName('title')->item(0)->nodeValue;

However, I would like to DomDocument also capture% of this content.

How could I do this?

Note : If there is a way to do this with favicon , it will be even better.

    
asked by anonymous 07.10.2015 / 15:11

5 answers

3

I can see by the way rel="shortcut icon" , that is, I get all that are tag link ( $dom->getElementsByTagName('link'); ) and then check its rel ( $itens->item($i)->getAttribute('rel') === 'shortcut icon' ), and play this in an array. You only have to make an adaptation in that site where it has several, following the same logic!

<?php

    //endereço do site  
    $site = ''

    $conteudo_html = file_get_contents($site);

    $dom = new DOMDocument();

    @$dom->loadHtml('<?xml encoding="UTF-8" ?>' . $conteudo_html);

    $itens = $dom->getElementsByTagName('link');
    $count = $itens->length;


    $finds = array();

    for($i = 0; $i < $count; $i++)
    {

        if ($itens->item($i)->getAttribute('rel') === 'shortcut icon')
        {

            array_push($finds, [
                'tag' => 'link', 
                'href' => $itens->item($i)->getAttribute('href'),
                'id' => 'shortcut icon',                
                'type' => $itens->item($i)->getAttribute('type'),
                ]
            );

        }

    }

    //itens encontrados
    var_dump($finds);
    
07.10.2015 / 15:44
1

Here is a very simple example, available on the PHP.net page, but with some modifications, to handle errors, and portability, for being a function.

function getUrl($url){
    $doc = new DOMDocument;
    // Aqui suprimi os erros, prepositadamente;
    if(!@$doc->loadHTMLFile($url)){
        $err="";    
        $erros = libxml_get_errors();
        foreach($erros as $erro){
            $err .= $erro->message; 
        }   
        return $err;
    } else {
        $xml = simplexml_import_dom($doc);
        $arr = $xml->xpath('//link[@rel="shortcut icon"]');
        return $arr[0]['href']; 
    }

}

// Ativar a gestão de erros
libxml_use_internal_errors(true);

echo getUrl("http://pt.stackoverflow.com");

References:

XML - PHP.net

Favicon Class - Controlstyle

How to get favicon from websites using PHP - SOen

    
07.10.2015 / 16:14
1

Using Stackoverflow as an example:

<link rel="shortcut icon" href="//cdn.sstatic.net/br/img/favicon.ico?v=c6678b633455">


$html = file_get_contents('http://pt.stackoverflow.com');

$dom = new DOMDocument();
@$dom->loadHTML($html);

$xpath = new DOMXPath($dom);
$favicon = $xpath->evaluate("//link[@rel='shortcut icon']");

print_r($favicon->item(0)->getAttribute('href'));

Will return:

  

// cdn.sstatic.net/en/img/favicon.ico?v=c6678b633455

    
07.10.2015 / 16:18
1

I suggest using the Simple Dom Library: Simple Html Dom

Example:

<?php
include("simple_html_dom.php");
$html = file_get_html('http://pt.stackoverflow.com/');

echo $html->find('link[rel="shortcut icon"]', 0)->href;

Sida: //cdn.sstatic.net/en/img/favicon.ico?v=c6678b633455

    
07.10.2015 / 17:30
0

One idea I can give you is, first capture the url of the site using curl() or file_get_contents() :

  <?php

if (isset($_GET['img'])) {
    $favicon = $_GET['img'];
    print_r(array('favicon'=>$favicon));
  die(); 
}

function capturarFaviconSite($url_metodo) {

$script = "\n" . '<script>' .
                 'function captureFavicon() {
                     var objSerializer = new XMLSerializer(), favicon;
                     var expFormat = /href="(.+).[png|ico|jpg|(.+)?v=(.+)]"/gi;
                     var expCheck = /(rel="icon"|rel="shortcut icon"|type="image\/png"|rel="apple-touch-icon")/gi;
                     var all = document.querySelectorAll("link");
                     for (var i in all) {
                           var fav = objSerializer.serializeToString(all[i]); 
                     if (expCheck.test(fav)) {
                         favicon = expFormat.exec(fav)[0]
                         .replace("href=\"","")
                         .replace("\"","");
                         break;
                      }
                   }
                   if (favicon!="") {
                       location.href="?img="+escape(favicon);
                   }
                 }';

$html = file_get_contents($url_metodo);
        return preg_replace('/<\/head>/',$script . 'captureFavicon();'."\n".'</script></head>',$html);
}

echo capturarFaviconSite('http://www.uol.com.br');

In the case above, what I'm doing is returning this by a javascript method:

function captureFavicon() {

  var objSerializer = new XMLSerializer(), favicon;
  var expFormat = /href="(.+).[png|ico|jpg|(.+)?v=(.+)]"/gi;
  var expCheck = /(rel="icon"|rel="shortcut icon"|type="image\/png"|rel="apple-touch-icon")/gi;
  var all = document.querySelectorAll('link');
  for (var i in all) {
       var fav = objSerializer.serializeToString(all[i]); 
       if (expCheck.test(fav)) {
           favicon = expFormat.exec(fav)[0]
          .replace("href=\"",'')
          .replace("\"",'');
       break;
      }
  }
return favicon;
}
    
07.10.2015 / 19:58