Why do these two ways of initializing the same list in Python generate structures of different sizes?

6

It is common to need to initialize a list in Python with a defined amount of elements and we can do this in two ways: 1) multiplying the list with one element by the desired quantity; or 2) use the list comprehensions technique.

  

Note: for mutable objects, use list comprehension .

1. Initializing the list by multiplying

>>> import sys
>>> lista1 = [None]*15
>>> print(lista1)
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
>>> print(sys.getsizeof(lista1))
184

2. Initializing with list comprehension

>>> import sys
>>> lista2 = [None for _ in range(15)]
>>> print(lista2)
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
>>> print(sys.getsizeof(lista2))
192

It's interesting to note that the created objects have, in fact, the same value:

>>> print(lista1 == lista2)
True

So where, and why, did this 8-byte difference occur between objects? Both have exactly the same amount (15) of the same value ( None ). Should not they be the same size?

    
asked by anonymous 01.08.2018 / 02:50

1 answer

5

This is an expected behavior in Python in relation to the reallocation of resources from a list.

Whenever memory is reallocated to a list, Python will allocate more memory than it actually does to prevent future relocation in the near future - it is simpler for you to reallocate n * X from memory once and change the list n times than having to relocate memory X by n times.

This is made explicit in a comment in the list_resize official repository:

/* This over-allocates proportional to the list size, making room
 * for additional growth.  The over-allocation is mild, but is
 * enough to give linear-time amortized behavior over a long
 * sequence of appends() in the presence of a poorly-performing
 * system realloc().
 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
 */
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);

That is, the memory size that will be reallocated to the list will be proportional to its current size. This is evident when you modify a list within a repeat loop and check its current size:

import sys

lista = []

for _ in range(20):
    print('Lista de tamanho {:>3} com memória {:>3}'.format(len(lista), sys.getsizeof(lista)))
    lista.append(None)

The result will be:

Lista de tamanho   0 com memória  64
Lista de tamanho   1 com memória  96
Lista de tamanho   2 com memória  96
Lista de tamanho   3 com memória  96
Lista de tamanho   4 com memória  96
Lista de tamanho   5 com memória 128
Lista de tamanho   6 com memória 128
Lista de tamanho   7 com memória 128
Lista de tamanho   8 com memória 128
Lista de tamanho   9 com memória 192
Lista de tamanho  10 com memória 192
Lista de tamanho  11 com memória 192
Lista de tamanho  12 com memória 192
Lista de tamanho  13 com memória 192
Lista de tamanho  14 com memória 192
Lista de tamanho  15 com memória 192
Lista de tamanho  16 com memória 192
Lista de tamanho  17 com memória 264
Lista de tamanho  18 com memória 264
Lista de tamanho  19 com memória 264
  

Note: Notice that for this example I modified the list from the append method. That is, this behavior is expected for any resource reallocation in the list, regardless of the source of this reallocation.

While it is possible to amortize the reallocation time of resources in a list - that can be beneficial to the performance of the application - there are setbacks in the use of resources.

If you have a list with 16 values, None , and need to add another, your list will consume 264 bytes in memory instead of just 200, which would be necessary - but remember, it's Python. You'll eventually give up some requirements to get others.

    
01.08.2018 / 14:21