How to know the number of lines a large file has in php?

6

How do I know the number of lines in a file using PHP?

I know there are functions like file , which returns all rows of the file in array . We could simply use a count , but the problem is that I need to do this for a 60mb file, and I do not think it's a good idea to use file in that case.

Is there any other way to do this?

How can I know, for example, how many rows exist in a 2gb file without crashing the PHP memory?

Is there any smarter way to count the rows of a file in PHP?

    
asked by anonymous 13.05.2016 / 18:55

4 answers

3

You have to read it in chunks of data. More or less like this:

$file = fopen("teste.txt",'r');
$count = 0;
while (!feof($file)) {
    $line = fgets($file, 4096); //provavelmente eu colocaria um valor maior, jamais menor
    $count++;
}
fclose($file);

I put a 4096 limit because it is risky if the file is too large and does not have enough line breaks to create small chunks . This solution is not perfect. A better one would need a much more sophisticated algorithm.

I came to think of another one that has problems too:

$file = fopen("teste.txt",'rb');
$count = 0;
while (!feof($file)) {
    $chunk= fread($f, 4096); //provavelmente eu colocaria um valor maior, jamais menor
    $count += substr_count($chunk, "\n");
}
fclose($file);

The line break can have more than one character and get one character in one chunk and the other in the next chunk . There you will not tell.

Ready-to-produce solutions would have to consider this and treat when it does. This is easier to solve in the second algorithm. It still has the advantage of never filling your memory.

Run tests to evaluate the best chunk size. I put 4K because it is the size of the memory page and the most common size of the cluster file system. Minors will be worse and tend to be more at risk of cutting line by half disrupting both algorithms. Larger ones can give much better results. I would venture to say that the higher the better, but it depends on the hardware, OS, usage pattern, etc. It can be good at testing and create some problem in use in production. If I could read the whole file it would always be the fastest and risk free.

GuilhermeLautert raised a question of% cos_de% being only the line feed and therefore would not cause problem in the break. But in Windows the break is \n (I can not test). Of the two one or PHP considers the \r\n in the code as a full line break and what I said happens, or this code would not work correctly in Windows, requiring the use of \n in the code to get the break in the correct way , which would have the problem of splitting the line break indicator in the same way.

13.05.2016 / 19:04
5

The two methods (fgets () and file ()) use loop to read the file (which is inevitable). Either implicitly or explicitly there will be a loop going through all the lines of the file.

But you just want to know the number of lines, so it does not matter the size of the file because you're just going to write a value. Do this:

$myfile = fopen("meuArquivo.txt", "r") or die("Unable to open file!");
while(!feof($myfile)) {
  $count++;
}
fclose($myfile);
echo $count;
    
13.05.2016 / 21:54
3

I, as OOP lover in PHP, would do this with the object SplFileObject

$file = new \SplFileObject('file.extension', 'r');
$file->seek(PHP_INT_MAX);
echo $file->key() + 1; 

I use PHP_INT_MAX to point to the last line of the file, because SplFileObject implements SeekableIterator .

So, since the row count starts with 0 , I have to add +1 to bring the right value.

Another detail: Since I am using SplFileObject , the iteration of a large file would be done line by line, thus saving in memory and being able to count a giant file without crashing the script.

    
13.05.2016 / 19:17
2

As a REGEX lover I propose:

$content = file_get_contents("file_name");          //  LE TODO ARQUIVO
$content = preg_replace('~[^\n]~', '', $content);   //  REMOVE TUDO QUE NÃO SEJA QUEBRA DE LINHA (\n)
print_r(strlen($content)+1);                        //  CONTA QUANTOS BYTES SOBRARAM, +1 POIS NO FINAL DO ARQUIVO NÃO TEM \n
    
13.05.2016 / 19:13