I would like to integrate a facebook-like system for reading external links in my project.
Type when posting a "www.un-site-qualquer.com" link on my site I would like to get a result like the picture below!
You can use a cURL for this and then use DOMDocument (or REGEX) to get the page data.
Facebook uses Open Graph markup , since many websites support it you can also read such data.
I'm using
http://g1.globo.com/rj/sul-do-rio-costa-verde/noticia/2017/01/acidente-com-teori-zavascki-aviao-comeca-ser-retirado-do-mar.html
", which is the last news from Globo .com at this time.
You can extract from this page the goal og:image
and og:title
and also og:description
. In addition, all websites have meta
defaults or it is expected to have description
and title
.
For example, using as a base an answer to the other question :
// Obtem o HTML da página
$ch = curl_init('http://g1.globo.com/rj/sul-do-rio-costa-verde/noticia/2017/01/acidente-com-teori-zavascki-aviao-comeca-ser-retirado-do-mar.html');
curl_setopt_array($ch, [
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_SSL_VERIFYHOST => 2,
CURLOPT_SSL_VERIFYPEER => true,
CURLOPT_PROTOCOLS => CURLPROTO_HTTP | CURLPROTO_HTTPS,
CURLOPT_REDIR_PROTOCOLS => CURLPROTO_HTTP | CURLPROTO_HTTPS,
CURLOPT_TIMEOUT => 5,
CURLOPT_MAXREDIRS => 2
]);
$html = curl_exec($ch);
curl_close($ch);
// Inicia o DOM e XPath:
$DOM = new DOMDocument;
$DOM->loadHTML($html);
$XPath = new DomXPath($DOM);
// Propriedades buscadas
$propriedades = ['description', 'title', 'type', 'image'];
// Verifica cada item da Array:
foreach ($propriedades as $propriedade){
$Meta = $XPath->query('//head//meta[(@property="og:'.$propriedade.'") or (@name="'.$propriedade.'")] | //head//'.$propriedade);
// Se achar o elemento irá obter o resultado
if($Meta->length !== 0){
$conteudo[$propriedade] = $Meta->item(0)->getAttribute('content') !== '' ? $Meta->item(0)->getAttribute('content') : $Meta->item(0)->nodeValue;
}
}
Result:
array(4) {
["description"]=>
string(134) "Serviço de remoção aconteceu no início da noite deste domingo (22).
Retirada foi feita por empresa contratada pelo Grupo Emiliano."
["title"]=>
string(73) "Acidente com Teori Zavascki: Avião que caiu em Paraty é retirado do mar"
["type"]=>
string(7) "article"
["image"]=>
string(122) "http://s2.glbimg.com/IAaOKflQpOoOSoi7pGNjkmirtjI=/1200x630/filters:max_age(3600)/s02.video.glbimg.com/deo/vi/65/44/5594465"
}
With this information you can assemble the HTML as you wish.
Explanations:
TheCURL:
CURLOPT_FOLLOWLOCATION
is used to follow location:
if this is informed by the server, CURLOPT_RETURNTRANSFER
is required to get the result, since CURLOPT_SSL_VERIFYHOST
and CURLOPT_SSL_VERIFYPEER
have been turned off so that you can get the information even on a server that has a self-signed certificate for example. You can also add timeout and maximum redirection.
XPath:
Used to fetch information from query:
//head//meta[(@property="og:'.$propriedade.'") or (@name="'.$propriedade.'")] | //head//'.$propriedade
This will make all the situations below valid:
<head>
<description>Valor</description>
<meta name="description" content="Valor" />
<meta property="og:description" content="Valor" />
</head>
To check if there was any occurrence, if there is any data, it is used:
$Meta->length !== 0
As the content can be within content
(in the last two examples) or inside the tag itself (in the first example), it was used:
$conteudo[$propriedade] = $Meta->item(0)->getAttribute('content') !== '' ? $Meta->item(0)->getAttribute('content') : $Meta->item(0)->nodeValue;
This will check if the content
attribute exists, in fact it will check if there is any data in it, otherwise it will get the value of the element.