List best practice, start with fixed capacity or start without limit?

4

I have a scenario where I will receive a list, or an array, or any other data type of the database where I can know the size of my list before creating it, what is the advantage between newList and the newList2 in my code below? Is there any performance benefit in this case? The types ( int and object ) are just to illustrate the example.

public Task TestMethod(List<int> amounts)
{
    List<object> newList = new List<object>(amounts.Count);
    List<object> newList2 = new List<object>();
}
    
asked by anonymous 14.03.2018 / 11:12

3 answers

9

I did a test with both cases and the performance was irrelevant, with small values, but memory lease is something that should be taken into account.

When you create the list with capacity 0 (zero), .NET will increase and allocate space dynamically as new items are added to the list, always doubling the previous capacity, power of 2. In practical terms, if your list has 2 elements and its capacity is 2, when adding one more element it doubles the capacity by allocating space for 4 elements. So long, but since it's exponential, when you have 64 elements and add one more, a capacity of 128 will be allocated, and so on. If you have 8192 elements and add another, the capacity goes to 16384 and so on.

With this we can conclude that from the point of view of space and memory allocation, it is more advantageous to define the capacity of the list if we know the size and it is large. How big is it? It would have to analyze the type of object that goes in the list to calculate the allocated memory, but something with more than 1000 items would already be interesting to define the capacity.

Here is a code that shows this:

System.Collections.Generic.List<int> lista = new System.Collections.Generic.List<int>();
Random rand = new Random();
int i =0;
while (i < 1000)
{
    lista.Add(rand.Next(0, 20000));
    i++;
    Console.Write("\nTamanho: " + lista.Count.ToString());
    Console.Write("  Capidade: " + lista.Capacity.ToString());
}

It can be run here: link

Output example:

  

Size: 31 Capability: 32
  Size: 32 Capability: 32
  Size: 33 Capability: 64
  Size: 34 Capability: 64

Note that capacity doubles, and soon the space allocated in memory as well.

EDIT : I ran the same code in dotnetfiddle with 50,000, 500,000, and 1,000,000 items, without initializing the capability and initializing. Below are the results showing the difference in memory consumption. Once again, performance from the point of view of time / cpu was irrelevant:

+-----------+------------------+------------------+
| Itens     | Sem inicializar  | Inicializando    |
+-----------+------------------+------------------+
| 50.000    | Memory: 512.23kb | Memory: 195.34kb |
+-----------+------------------+------------------+
| 500.000   | Memory: 4.01Mb   | Memory: 1.91Mb   |
+-----------+------------------+------------------+
| 1.000.000 | Memory: 8.01Mb   | Memory: 3.81Mb   |
+-----------+------------------+------------------+
    
14.03.2018 / 12:30
6

The advantage of using the list with a defined capacity is that there will be no future relocation of the list items. When you use a list without setting the size, it begins with zero and it will grow as you need . The capacity will always be greater than or equal to the number of elements in the list.

In short, you gain performance by using a list with defined capability because internally C # will not need to reallocate existing elements to add a new one.

    
14.03.2018 / 12:17
1

If nothing is specified, it starts with capacity 0 and jumps to 4 if you add an item. Then it doubles capacity whenever it occupies all the free spaces. At least it's the current implementation, nothing guarantees that this will always be so.

If you already know how many elements the list will have, or at least something approximate, there will be gain in setting the initial ability to avoid memory relocation. If you have the information there is zero advantage in not putting the initial capacity, even if the gain is not so great, and it's not that small. Without indicating the capacity is 40-50% slower, this in an operating system that makes optimizations in reallocations, if it does not have it, can turn into a tragedy.

In addition to having to relocate memory every time you increase size puts pressure on the garbage collector , which will trigger it more often and with larger pauses. In general the tests that people do ignore this. We should always avoid allocation, even small. And many allocations are not easily visible, which is the case for a growing list.

It is preferable to allocate more elements than you are going to use than to do memory reallocation. Of course there is always a point that can not compensate.

    
17.03.2018 / 17:11