Unexpected result in using Parallel.ForEach

2

Next, I have a class that has a string list and the following structure

public class Teste
{

      private List<string> _codigos;


      public void InsertDB(string[] files)
      {
          _codigos = new List<string>();

          Parallel.ForEach(files, file => Processa(file));

          Console.WriteLine(_codigos.Count);    
      }

      private static void Processa(string file)
      {    
           //Efetua um tratamento
           string resultado = "Obtem um resultado";
           _codigos.Add(resultado);    
      }
}

The problem is this: if my files array has 7000 elements, my _codigos list should have 7000 elements. But that does not happen, every time I run the program, the list goes with 6989, 6957, 6899, etc ... Always a random number.

The interesting thing is that when I replace Parallel.ForEach () with a simple foreach () as follows:

foreach(string file in files) {
    Processa(file);
}

Well yes I get the expected result, _code with 7000 elements.

What am I doing wrong?

    
asked by anonymous 08.02.2018 / 14:09

2 answers

5

List is not thread safe, what may be happening is that in some cases two threads will attempt to add an item at the same time, this may generate something unexpected (like adding an object or an exception happens). In your case I recommend using ConcurrentBag:

var _codigos = new ConcurrentBag<string>();

ConcurrentBag is better because it uses internal bags that store the value for each thread and does not use lock , in addition to avoiding the problem.

    
08.02.2018 / 14:44
2

Gabriel Coletta is right. List is not thread safe.

Whenever you have parallel code trying to write to a single structure, you have to check that the code does not conflict.

An example that shows the care you need to have is (this will be in C ++, but it's simple):

void AdicionarNumero(int valorNovo)
{
    valorCompartilhado += valorNovo;
    return;
}

If you have two threads running this function in parallel, you may have problems. The code valorComparthilhado += valor; maybe becomes the instructions:

  • Save value of valorNovo in a record (memory in the CPU).
  • Save value of valorCompartilhado in a record.
  • Add the two records and save the result to a record.
  • Save the sum in valorCompartilhado .
  • If the two threads arrive at step 3 at the same time, the two have the same value of valorCompartilhado saved. Then the two sum, one will save the result, and then the other will save the result. This means that one of the results will be thrown away.

    If you do not limit this code to one "thread" at a time, you can not control the result. If both threads call AdicionarNumero at the same time with valorCompartilhado == 5 and arguments of 3 and 1 , you can get results of 6 , 8 , or 9 stored in valorCompartilhado .

    The way to limit code so that only one thread can enter each time is with a lock (as Gabriel Coletta commented). You can also use a structure and an algorithm that does not fail in parallel, even without lock (as ConcurrentBag ).

        
    08.02.2018 / 15:39