Why is array_shift considered to be a lentissima function?

6

According to the Manual, array_shift removes the first element from the array.

I've seen a lot of criticism from the internet as to how to perform this function, since it reordered the entire index of the array with each removal, thus reallocating the entire array.

I found this graph that represents the difference of performasse between array_shift and array_pop (remove the last element of the array), which shows an absurd difference.

Ihaveseenalteratives,liketheexamplebelow,toavoidthe"performance problem" (because of reindexing the indices) as follows:

function array_first(&$array)
{
   reset($array);

   $key = key($array);

   $value = $array[$key];

   unset($array[$key]);


    return $value;
}
  • Creating a "manual" alteration really is the best way to solve this problem?

  • What makes this function slow is it simply the reindexing of the elements, or are there other factors as well?

asked by anonymous 03.03.2015 / 14:22

3 answers

2

The performance of this function is slow on certain occasions due to the reindexing that is done to reorder the items in the array , it has to be taking into consideration that its complexity is O(n) .

Assuming we have an array with 5 values and want to remove the first element from the array with the function array_shift , the process would look something like this:

array     >>  a b c d e f
copia     >> [a]
deleta    >> [a]
array     >>  " b c d e f
reindexa  >>  " b c d e f
                ↙ ↙ ↙ ↙
array     >>  b c d e f   

The inner workings of array_shift are basically so :

  • Gets the first value, where arData is an array used to set the values of array received:

    >> idx = 0
    >> while(1)
    >>   p = Z_ARRVAL_P(stack)->arData + idx;
    >>   val = &p->val;
    
  • Copy this value to return as a return:

    >> ZVAL_COPY(return_value, val);
    
  • After copying the value, it is deleted:

    >> if (Z_ARRVAL_P(stack) == &EG(symbol_table).ht) // <-- se for Hashtable
         zend_delete_global_variable(p->key)
    >> else
         zend_hash_del(Z_ARRVAL_P(stack), p->key)
    
  • Now the reindexing of the items is done:

    >> if (Z_ARRVAL_P(stack)->u.flags & HASH_FLAG_PACKED) { // <-- Se for array compactada
    >>   uint32_t k = 0;
    >>   for (idx = 0; idx < Z_ARRVAL_P(stack)->nNumUsed; idx++) 
    >>      p = Z_ARRVAL_P(stack)->arData + idx;
    >>      if (Z_TYPE(p->val) == IS_UNDEF) continue; // <-- Se não existir valor, continua
            if (idx != k) // <-- Se for diferente de 0, 0: item já não existente
                Bucket *q = Z_ARRVAL_P(stack)->arData + k;
                q->h = k;
                q->key = NULL;
                ZVAL_COPY_VALUE(&q->val, &p->val);
                ZVAL_UNDEF(&p->val);
            }
            k++;
        }
    
  • This reindexing process may not get into the above code block if the array does not have the flag HASH_FLAG_PACKED (indicates that it is a array ) and can enter the code block of else at the 2246 .

Whether or not to use this function may depend on the size of the array being parsed, there are alternatives like using the array_pop in conjunction with array_reverse or reset with array_pop or unset($array[$indice]) , is last maybe is not as slow as array_shift .

    
03.03.2015 / 18:13
1

The function becomes slow just reindexing the elements, in its same graph it becomes very clear that the greater the number of elements in the array the greater the execution time.

And to my mind there's no problem getting around this with some other alternative, like implementing your own function.

If you want to analyze it follows the code of the implementation of the functions in PHP link

On line 2167 has the implementation > array_shift and the 2106 line array_pop , both are similar only by noting the fact that reindexing array elements in the array_shift function, as described in 2218

    
03.03.2015 / 14:38
1
  

Creating a "manual" alteration is really the best way to solve   this problem?

The array_shift function has a runtime of 2 n , that is, the larger the array , the more time it will take. This is only a problem if you are working with very large arrays .

According to the Benchmark , up to a size of 1000 the performance is almost the same as array_pop .

  

What makes this function slow is simply the reindexing of   elements, or are there other factors as well?

The function itself is not slow. The factor that causes the slowness is the data load it has to work. Redoing the function with a generic implementation may have the opposite effect with small arrays , because kernel functions are faster than user functions.

It will be very difficult to find a specific case that the function will always receive a giant load of data. If this is the case and the array keys are not relevant, use an alternative implementation only in this code snippet.

Another alternative to array_shift can be achieved by combining array_pop with array_reverse :

<?php

$array = array_reverse(getArrayGigante());

array_pop($array);
    
03.03.2015 / 15:24