filesize for files larger than 2GB on x86 platforms

12

I was reading the PHP documentation and noticed this information:

  

Note: Because PHP's integer type is signed and many platforms use 32bit integers, some filesystem functions may return unexpected results for files which are larger than 2GB.

When accessing the documentation in Portuguese I noticed this:

  

Note: Because the PHP integer type is flagged and many platforms use 32-bit integers, filesize () can return unexpected results for files that are larger than 2 Gb. For files between 2 Gb and 4 Gb, you can solve this problem using sprintf ("% u", filesize ($ file)).

They give a hint of using sprintf , however I found this question:

Apparently they tried several methods. I do not know if it was some collaborator of the Portuguese documentation that added this code:

sprintf("%u", filesize($file))

What I would like to know is if it has any problems (since it seems that only the Portuguese documentation personnel thought of this). For example:

  • Does it fail in any particular situation?
  • Is it inaccurate as to the actual weight of the file?
  • Or does the code really work to convert the weight into integers into a numeric string?
asked by anonymous 11.10.2015 / 00:02

2 answers

3

It seems that the problem extends beyond and despite Edilson's report, I noticed that it is not in every environment or PHP version that this will work well or necessarily necessarily.

On an x64 system, a file larger than 4GB returned a positive value, but it was not the file size, ie did not work:

 sprintf("%u", filesize($file))

Even though it is in an x64 environment and PHP5 being compiled for x64, it still will not be 100% x64, actually it is x86_x64 in Windows (in PHP7 things worked a little better).

The problem is not well with PHP necessarily, but it is due to PHP5 working with 32bit and even 64bit will have a limitation, so what I needed was something that works well almost independent of the environment, I do not need to do calculations with value, I just needed to know the size of a file, I came up with these solutions:

Native system software

This solution will depend on stat being available on Linux and Mac OSX and BSD servers for example, I do not know if it is something that works on all platforms, for Windows I used this SOen

Something like:

  • Unix-like: stat -c %s arquivopesado.7z (there are variations of this command for different types of unix-like systems, including for Mac, that is, you would have to adjust the command)

  • Windows: for %F in ("arquivopesado.7z") do @echo %~zF

The script looks like this:

<?php
function filesizealternativo($arquivo)
{
    if (is_file($arquivo) === false) {
        return false;
    }

    $arqarg = escapeshellarg(realpath($arquivo));

    if (strcasecmp(substr(PHP_OS, 0, 3), 'WIN') === 0) {
        $command = 'for %F in (' . $arqarg . ') do @echo %~zF';
    } else {
        $command = 'stat -c %s ' . $arqarg;
    }

    $resposta = shell_exec($command);

    if ($resposta === null) {
        return false;
    }

    $resposta = trim($resposta);

    if (is_numeric($resposta)) {
        return $resposta;
    }

    return false;
}

$a = filesizealternativo('arquivogrande.7z');

var_dump($a);

Using the file: /// protocol with PHP

The problem of using stat is the compatibility of some servers and dependencies, there are also some servers that block the shell_exec , exec , system , etc functions, so I ran a test with CURL and file:// ( link ), the result was very functional:

function filesizealternativo2($arquivo)
{
    if (is_file($arquivo) === false) {
        return false;
    }

    $arquivo = realpath(preg_replace('#^file:#', '', $arquivo));

    $ch = curl_init('file://' . ltrim($arquivo, '/'));

    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //Faz o retorno ser salvo na variável
    curl_setopt($ch, CURLOPT_HEADER, 1); //Faz retornar os headers
    curl_setopt($ch, CURLOPT_NOBODY, 1); //Evita retornar o corpo

    $headers = curl_exec($ch);
    curl_close($ch);

    $ch = null;

    //Com preg_match extraímos o tamanho retornado de Content-Length
    if (preg_match('#(^c|\sc)ontent\-length:(\s|)(\d+)#i', $headers, $matches) > 0) {
        return $matches[3];
    }

    return false;
}

$a = filesizealternativo2('arquivogrande.7z');

var_dump($a);

In this way the only dependency will be the Curl extension, which is usually already enabled on many servers.

    
10.02.2017 / 00:55
5

It seems that the problem occurs due to the signaling that PHP itself imposes on integer , and many platforms use 32 bit signaling >, which is why filesize () sometimes returns unexpected results for files larger than 2GB .

As for the explanation of why this expression is more appropriate, it is a bit complicated to answer, as several users have tried in many ways to write their own and even more complex methods to get the actual size of a file.

  

It prints the result of filesize the UNSIGNED INT so it can be until 4GB.   The reason is, SIGNED INT runs until 2GB and flips to -2GB watch following:

     

Translation:   This prints the result of _ "filesize" as "unsigned integer" , so it can be up to 4GB. It happens because "signed integers" runs up to 2GB and turns to -2GB, see:

file<2GB      = SIGNED:  1048576512 UNSIGNED: 1048576512
file>2GB      = SIGNED: -2100140103 UNSIGNED: 2194827193
file>4GB      = SIGNED:  -100662784 UNSIGNED: 4194304512

This text quoted above was taken from a directory PHP , in it the user explains why the function. But it does not say if it is the right choice or not.

In my opinion, this expression is very likely to be used because it returns negative values for files between 2GB and 4GB , which can still be corrected with some calculation, and returns a value as a definitive and incorrigible for files over 4GB . In fact it was somewhat alarming the example to be only in the note of the documentation in Portuguese, but, the example already existed in the notes of contribution.

On the PHP page the examples we find here are simpler, that does not mean that it is the only way to get the actual size of a file. This is likely to require some testing on your part, because you do not find much information about why you use sprintf .

  

Does it fail in any particular situation?

Some users have reported that there have been glitches on some systems based on the x86 architecture, and some problems reported on x64 systems, so it is very likely that there are still some errors. If it still fails, it will return a E_WARNING or simply FALSE .

  

Is it inaccurate as to the actual weight of the file?

Precision is good, returns actual size in bytes .

  

Or does the code really work to convert the weight into integers into a numeric string?

Yes, it works, this was the return I got in the last result:

$file = "ficheiro.zip";
var_dump(sprintf("%u", filesize($file)));

Retorno: string(4) "5209" (5.08KB)
Retorno: string(10) "2092964971" (1.94GB)

There are several examples available on how to get the actual file size for a variety of platforms, some even based on shell , just look for what suits you best. If you need more details, I believe that the only solution will be to do isolated tests, and to go deeper in the search.

Good luck.

References:

PHP.cz

PHP.tw

PHP.edu

Drupal.org

PHP.net

    
11.10.2015 / 07:08