Get Twitter information without using the API with cURL

0

I have the following code:

$url = 'https://twitter.com/' . $username;

$user = curl_init();
curl_setopt_array($user, [
      CURLOPT_URL             => $url,
      CURLOPT_CUSTOMREQUEST   => 'GET',
      CURLOPT_CAINFO          => 'cacert-2017-06-07.pem',
      CURLOPT_RETURNTRANSFER  => true,
      CURLOPT_SSL_VERIFYPEER  => false,
      CURLOPT_SSL_VERIFYHOST  => 2,
      CURLOPT_HTTPHEADER      => [
        "Content-type:text/html;charset=utf-8",
      ],
      CURLOPT_USERAGENT       => $_SERVER['HTTP_USER_AGENT'],
      CURLOPT_HEADER          => true,
      CURLOPT_FOLLOWLOCATION  => true,
      CURLOPT_MAXREDIRS       => 2,
      CURLOPT_REDIR_PROTOCOLS => CURLPROTO_HTTP | CURLPROTO_HTTPS,
      CURLOPT_POSTREDIR       => 2,
      CURLOPT_AUTOREFERER     => 1,
      CURLOPT_ENCODING        => "gzip"
  ]
);
$user_info = json_encode(curl_exec($user));
//$user_info = json_decode(curl_exec($user));

var_dump($user_info);
echo $user_info;

Well, this returns me:

Iwouldliketoextractinformationlike:

Screen_name,Name,Profile_img,etc.

Afriendofawebsiteownersaidthatitispossible,buthedidnotwanttogivethearmtocheerandteachme,whatlogicbehind?Isitpossible?

MonitoringthenetworkIgotthis:

-H"accept-encoding: gzip, deflate, br"
-H "accept-language: pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4"
-H "upgrade-insecure-requests: 1"
-H "user-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
-H "accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
-H "cache-control: max-age=0"
-H "authority: twitter.com"
    
asked by anonymous 03.07.2017 / 00:52

1 answer

3

The Twitter page apparently has a input[type=hidden] field with the JSON of the data, which makes our life a lot easier. The result obtained in:

$user_info = curl_exec($user);

Nothing is more than the HTTP response obtained when the request was made. To get just the body of the answer, that is, the HTML code, just do:

$header_size = curl_getinfo($user, CURLINFO_HEADER_SIZE);
$header = substr($user_info, 0, $header_size);
$body = substr($user_info, $header_size);

Thus, $header will be the HTTP response headers and $body the HTML code. To analyze this code, we use the native class DOMDocument (never use regex):

$dom = new DOMDocument();
@$dom->loadHTML($body);

The @ in the second line is to hide messages from warnings generated due to errors in the HTML of the Twitter page (several elements with same id ). The above field that has JSON is:

<input type="hidden" id="init-data" class="json-data" value="..." />

So just look for the id init-data in the DOM:

$json = $dom->getElementById("init-data")->getAttribute("value");

So we use json_decode to convert to an object:

$data = json_decode($json);

And we can access the information you want:

echo "Nome: ", $data->profile_user->name, PHP_EOL;
echo "Usuário: ", $data->profile_user->screen_name, PHP_EOL;
echo "Foto de perfil: ", $data->profile_user->profile_image_url, PHP_EOL;

In my case, the output was:

Nome: Anderson Carlos Woss
Usuário: acwoss
Foto de perfil: http://pbs.twimg.com/profile_images/827606791592747008/9EdeoXRp_normal.jpg

The whole code would look something like:

<?php

$url = 'https://twitter.com/' . $username;

$user = curl_init();
curl_setopt_array($user, [
      CURLOPT_URL             => $url,
      CURLOPT_CUSTOMREQUEST   => 'GET',
      CURLOPT_CAINFO          => 'cacert-2017-06-07.pem',
      CURLOPT_RETURNTRANSFER  => true,
      CURLOPT_SSL_VERIFYPEER  => false,
      CURLOPT_SSL_VERIFYHOST  => 2,
      CURLOPT_HTTPHEADER      => [
        "Content-type:text/html;charset=utf-8",
      ],
      CURLOPT_USERAGENT       => $_SERVER['HTTP_USER_AGENT'],
      CURLOPT_HEADER          => true,
      CURLOPT_FOLLOWLOCATION  => true,
      CURLOPT_MAXREDIRS       => 2,
      CURLOPT_REDIR_PROTOCOLS => CURLPROTO_HTTP | CURLPROTO_HTTPS,
      CURLOPT_POSTREDIR       => 2,
      CURLOPT_AUTOREFERER     => 1,
      CURLOPT_ENCODING        => "gzip"
  ]
);

$user_info = curl_exec($user);

$header_size = curl_getinfo($user, CURLINFO_HEADER_SIZE);
$header = substr($user_info, 0, $header_size);
$body = substr($user_info, $header_size);

$dom = new DOMDocument();
@$dom->loadHTML($body);

$json = $dom->getElementById("init-data")->getAttribute("value");
$data = json_decode($json);

echo "Nome: ", $data->profile_user->name, PHP_EOL;
echo "Usuário: ", $data->profile_user->screen_name, PHP_EOL;
echo "Foto de perfil: ", $data->profile_user->profile_image_url, PHP_EOL;
    
03.07.2017 / 01:53