Return entire page via cURL

0

I need to return a page that is in text / html , but it is encoded with zlib , yes I tried to decode but no chances, http://php.net/manual/pt_BR/function.zlib-decode.php">zlib_decode , is not documented so I did searches but everything without success, see the return:

'HTTP/1.1 200 OK
Content-Type: text/html
X-Frame-Options: SAMEORIGIN
Vary: Cookie, Accept-Language, Accept-Encoding
Cache-Control: private, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Content-Language: pt-br
Content-Encoding: gzip
Date: Wed, 03 Jan 2018 19:03:06 GMT
Strict-Transport-Security: max-age=86400
Set-Cookie: alguns cookies'... (length=5280)

And here's my request

function challenge($url) {
    $getCSRF = getCSRF();

    $request = curl_init();
    curl_setopt_array($request, array(
        CURLOPT_URL                         => 'https://www.url.com/' . $url,
        CURLOPT_CUSTOMREQUEST       => 'GET',
        CURLOPT_HEADER                  => true,
        CURLOPT_RETURNTRANSFER  => true,
        CURLOPT_SSL_VERIFYHOST  => false,
        CURLOPT_SSL_VERIFYPEER  => false,
        CURLOPT_COOKIE                  => $getCSRF->cookies,
        CURLOPT_USERAGENT               => $_SERVER['HTTP_USER_AGENT'],
        CURLOPT_HTTPHEADER          => array(
            'accept-encoding:gzip, deflate, br',
            'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
            'accept-language:pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7',
        )
    ));
    $response = curl_exec($request);
    curl_close($request);

    return $response;
}

var_dump(challenge('challenge/id/Fn0C4GsZjg/'));
    
asked by anonymous 03.01.2018 / 21:50

1 answer

1

First make false CURLOPT_HEADER , like this:

CURLOPT_HEADER => true,

For if you are interested in headers you can pick up using curl_getinfo and maybe split the string (or detect when two line breaks occur first), but this is another situation.

You also need to detect redirects so you do not fall into empty pages, so you can use

CURLOPT_FOLLOWLOCATION  => true,

And then finally you can use gzdecode , like this:

function challenge($url) {

    $ch = curl_init();
    $getCSRF = getCSRF();

    $request = curl_init();
    curl_setopt_array($request, array(
        CURLOPT_URL             => 'https://www.url.com/' . $url,
        CURLOPT_CUSTOMREQUEST   => 'GET',
        CURLOPT_HEADER          => false,
        CURLOPT_RETURNTRANSFER  => true,
        CURLOPT_SSL_VERIFYHOST  => false,
        CURLOPT_SSL_VERIFYPEER  => false,
        CURLOPT_FOLLOWLOCATION  => true,
        CURLOPT_USERAGENT       => $_SERVER['HTTP_USER_AGENT'],
        CURLOPT_HTTPHEADER      => array(
            'accept-encoding:gzip, deflate, br',
            'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
            'accept-language:pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7',
        )
    ));

    $response = curl_exec($request);

    $resposta_http = curl_getinfo($request, CURLINFO_HTTP_CODE);

    //Qualquer código fora do range 200 e 299 provavelmente é pagina de erro
    if ($resposta_http < 200 && $resposta_http > 299) {
        $response = null;
    }

    curl_close($request);

    //Decodifica
    return $response ? gzdecode($response) : false;
}

var_dump(challenge('challenge/id/Fn0C4GsZjg/'));

Of course the above example is if it is Gzip, if deflate you will probably have to use gzinflate (I'll prepare a more elaborate example)

Another situation you can solve is simply do not send accept-encoding as @Bacco said, like this:

curl_setopt_array($request, array(
    CURLOPT_URL             => 'https://www.url.com/' . $url,
    CURLOPT_CUSTOMREQUEST   => 'GET',
    CURLOPT_HEADER          => false,
    CURLOPT_RETURNTRANSFER  => true,
    CURLOPT_SSL_VERIFYHOST  => false,
    CURLOPT_SSL_VERIFYPEER  => false,
    CURLOPT_FOLLOWLOCATION  => true,
    CURLOPT_USERAGENT       => $_SERVER['HTTP_USER_AGENT'],
    CURLOPT_HTTPHEADER      => array(
        'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
        'accept-language:pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7',
    )
));

If the page is really lightweight and you do not need to be compressed at the time of downloading.

    
03.01.2018 / 22:27