Extract data from a facebook profile by email

2

I need to check if there is a Facebook profile, by passing the email parameter.

I noticed that the API has no way.

But the facebook site has the URL:

link

Instead of @ I put a valid email and it finds the profile.

The question would be:

1 - How can I via file_get_contents access this URL dynamically via PHP, and get the name of the profile and photo.

Note that by accessing via the browser, and by placing a valid email there, it shows name, profile photo etc.

Thank you

    
asked by anonymous 15.10.2016 / 21:47

1 answer

0
<?php
date_default_timezone_set('Asia/Tokyo');

ini_set('error_reporting', E_ALL & ~E_STRICT & ~E_DEPRECATED); // & ~E_NOTICE
ini_set('log_errors', true);
ini_set('html_errors', false);
ini_set('display_errors', true);

define('CHARSET', 'UTF-8');

ini_set('default_charset', CHARSET);
mb_http_output(CHARSET);
mb_internal_encoding(CHARSET);
mb_regex_encoding(CHARSET);

header('Content-Type: text/html; charset='.CHARSET);


/*
A parte que interessa começa aqui. O trecho acima é somente um bootstrap.
*/

$email = '[email protected]';
$url = 'https://www.facebook.com/search/all/?q='.$email;
$data = file_get_contents($url);
$data = html_entity_decode($data);
$data = str_replace(array('<!-- ', ' -->'), '', $data);

class Foo {

    private $data;
    private $dom;

    public function __construct($data) {
        $this->data = $data;
        $this->dom = new DOMDocument();
        $this->dom->validateOnParse = false;
        $this->dom->preserveWhiteSpace = true;
    }

    public function htmlGetContentBySelector($query, $data = null) {
        if (!empty($data)) {
            $this->data = $data;
        }
        libxml_use_internal_errors(true);
        @$this->dom->loadHTML($this->data);
        libxml_use_internal_errors(false);
        $xpath = new DOMXPath($this->dom);
        $xpath_resultset = $xpath->query($query);
        return $this->dom->saveHTML($xpath_resultset->item(0));
    }
}

$c = new Foo($data);

$query = "//code[@id='u_0_d']";
$rs = $c->htmlGetContentBySelector($query);
// O resultado integral
// Exibe o bloco inteiro
//echo $rs; exit;

/*
Agora vamos filtrar e extrair o que interessa

Aqui pegamos a foto.
*/
$query = "//img[@class='_fbBrowseXuiResult__profileImage img']";
$pic = $c->htmlGetContentBySelector($query, $rs);
echo $pic;

/*
retorno
<img class="_fbBrowseXuiResult__profileImage img" src="https://scontent-nrt1-1.xx.fbcdn.net/v/t1.0-1/c17.0.100.100/p100x100/FOTO-DO-PERFIL"width="100" height="100" alt="NOME DO PERFIL">
*/

/*
O nome e URL do perfil.
*/
$query = "//div[@class='_gll']";
$name = $c->htmlGetContentBySelector($query, $rs);
echo $name;

/*
<div class="_gll"><div><a href="https://www.facebook.com/pagina-da-pessoa"><div class="_5d-4"><div class="_5d-5">NOME DO PERFIL</div></div></a></div></div>
*/

/*
Empresa onde trabalha.
*/
$query = "//div[@class='_glm']";
$job = $c->htmlGetContentBySelector($query, $rs);
echo $job;

/*
     <div class="_glm"><div class="_pac" data-bt="{" ct>å¤åå: <a href="https://www.facebook.com/pages/pagina-da-empresa">NOME DA EMPRESA</a><div class="_1my"></div>
</div></div>
     */

The results still have HTML formatting, however, they are very easy to manipulate and extract the data if you want to remove the HTML from them.

The variable $rs returns something like this:

string(1558) "<code id="u_0_d"><!-- <div class="_4-u2 _4-u8"><div id="all_search_results" data-bt="{"session_id":"5505924b49749c699b44850e32fe24fa","typeahead_sid":null,"result_type":"all","referrer":"","path":"\/search\/all\/","experience_type":"simplepps"}"><div class="_1yt"><div class="_3u1 _gli _5und" data-bt="{"id":1251714145,"rank":null,"abtest_version":null,"abtest_params":[null],"section":"main_column","owner_id":null,"sub_id":null,"browse_location":null,"query_data":{"q":"email\u0040que.deseja.buscar"},"is_headline":false}"><div class="_401d"><div class="clearfix"><a class="_fbBrowseXuiResult__profileImageLink _8o _8s lfloat _ohe" href="https://www.facebook.com/pagina.da.pessoa" aria-hidden="true" tabindex="-1"><img class="_fbBrowseXuiResult__profileImage img" src="https://scontent-nrt1-1.xx.fbcdn.net/v/t1.0-1/c17.0.100.100/p100x100/xxxxxx-FOTO-DA-PESSOa-xxxxx_n.jpg?oh=74ae0b9e2cc130f9800f98d35d64ce36&oe=58AAB17B"width="100" height="100" alt="wa wa" /></a><div class="_42ef"><div class="_glj"><div class="clearfix"><div class="_glk rfloat _ohf"></div><div class="_gll"><div><a href="https://www.facebook.com/pagina.da.pessoa"><div class="_5d-4"><div class="_5d-5">NOME DA PESSOA   </div></div></a></div></div></div><div><div class="_glm"><div class="_pac" data-bt="{"ct":"sub_headers"}">Job: <a href="https://www.facebook.com/pages/página-empresa-onde-trabalha/codigo-qualquer">NOME DA EMPRESA ONDE TRABALHA</a><div class="_1my"></div></div></div><div class="_glo"></div></div><div class="_glp"></div><div class="_3t0c"></div></div></div></div></div></div></div></div></div> --></code>"

Note: The facebook URL obviously will not return data from profiles that are configured to hide data.

I can not tell if the result can return more than one profile. But considering that the emails are unique for each profile, then we can risk extracting the data like name, url and profile photo, without worrying about it.

Also important that the values defined in the class and id attributes can change. The above script may no longer work properly because of this or also for any other reason in the future because it is a gambiarra and not an official and documented way.

Be aware that abnormal requests can lead to blocking of the requesting IP. So use it sparingly.

    
16.10.2016 / 07:05