I have a WordPress plugin that does something similar using the Open Graph (en) inserted in many web pages today.
Example of a video on YouTube:
<meta property="og:site_name" content="YouTube">
<meta property="og:url" content="http://www.youtube.com/watch?v=aZMbTFNp4wI">
<meta property="og:title" content="No Woman, No Drive">
<meta property="og:image" content="https://i1.ytimg.com/vi/aZMbTFNp4wI/maxresdefault.jpg">
<meta property="og:description" content="Download directly from us: http://ldr.fm/tX6XP Download from iTunes: https://itun.es/i6F668z Follow: Hisham Fageeh: http://Twitter.com/HishamFageeh Fahad Alb...">
<meta property="og:type" content="video">
<meta property="og:video" content="http://www.youtube.com/v/aZMbTFNp4wI?version=3&autohide=1">
<meta property="og:video:type" content="application/x-shockwave-flash">
<meta property="og:video:width" content="1920">
<meta property="og:video:height" content="1080">
In addition to the OG, there are Twitter Cards and other Meta Data social :
<meta name="twitter:url" content="http://www.youtube.com/watch?v=aZMbTFNp4wI">
<meta property="al:android:url" content="http://www.youtube.com/watch?v=aZMbTFNp4wI">
<meta property="al:ios:app_name" content="YouTube">
The process in the plugin is:
-
<input>
text where the user pastes the link,
- An AJAX request is fired to a PHP function,
- PHP reads the URL and analyzes the content by extracting the OG and Twitter information, from where it is extracted: title, description, representative image,
- return information in JSON format to JavaScript and render content received using jQuery.
The following code is part of the function that AJAX invokes and processes the results of the query to the URL. WordPress does REQUEST HTTP
using PHP's Curl extension or Streams (depending on the case). But anyway, I just need to call the function wp_remote_get
, it returns the result to me and I make the $response['body
] ':
if ( $data = wp_remote_retrieve_body( $response ) )
{
$rmetas = array(); // Array for JSON
libxml_use_internal_errors(true);
$doc = new DomDocument();
$doc->loadHTML($data);
$xpath = new DOMXPath($doc);
$query = '//*/meta[starts-with(@property, \'og:\')]';
$ogs = $xpath->query( $query );
foreach ( $ogs as $meta )
{
$property = $meta->getAttribute('property');
$content = $meta->getAttribute('content');
$rmetas[$property] = $content;
}
if( empty( $rmetas ) )
wp_send_json_error( array( 'error' => __( 'No OG data in the page.' ) ) );
/* Meta Data for the post */
if( $autor = $this->xpath_query( $xpath, 'meta', 'name', 'author', 'content' ) )
$rmetas['author'] = $autor;
if( $date = $this->xpath_query( $xpath, 'meta', 'name', 'dc.date', 'content' ) )
$rmetas['date'] = $date;
if( $url = $this->xpath_query( $xpath, 'link', 'rel', 'shorturl', 'href' ) )
$rmetas['shorturl'] = $url;
if( $aurl = $this->xpath_query( $xpath, 'meta', 'property', "article:author", 'content' ) )
$rmetas['authorurl'] = $aurl;
$twits = $xpath->query('//*/meta[starts-with(@property, \'twitter:\')]');
foreach ( $twits as $meta )
{
$property = $meta->getAttribute('property');
$content = $meta->getAttribute('value');
if( 'twitter:site' == $property )
$rmetas['twitter'] = $content;
}
wp_send_json_success( $rmetas ); // Esta função inclui um die(); o erro abaixo não roda
}
wp_send_json_error( array( 'error' => __( 'Undefined error.' ) ) );
The queries of DOMXpath did with the help of Stack Overflow, simply searching within this advanced search until you find something suitable.
If meta information is not present on the page, you could do a scrape traditional, but never got to that point. But I see interesting things here (----- >) in the Related column: