Why is AddRange much faster than Add?

16

I'm working on a data integration between two bases, and I'm using Entity Framework for this.

I then generated the following code, which iterates each record in the Situations table of base dbExterno and feeds my base db :

    foreach (var item in dbExterno.Situacoes)
    {
        StatusRecursos statusNew = new StatusRecursos();
        statusNew.Id = item.CodSit;
        statusNew.Nome = item.DesSit;

        db.tTipStatusRecursos.Add(statusNew); //Isso se mostrou muito lento!
    }
    db.SaveChanges();

But I noticed that the above code was very slow, taking minutes to complete an interaction in about 3000 records.

I then changed the code to the code below, and the process took seconds. In this second code I instead of adding each item to context using Add() , first feed a generic list of StatusResources, and then add it to the context using AddRange() .

    List<StatusRecursos> listStatus = new List<StatusRecursos>();
    foreach (var item in dbExterno.Situacoes)
    {
        StatusRecursos statusNew = new StatusRecursos();
        statusNew.Id = item.CodSit;
        statusNew.Nome = item.DesSit;
        listStatus.Add(statusNew);  //Não foi lento como o código anterior.
    }
    db.tTipStatusRecursos.AddRange(listStatus); 
    db.SaveChanges();

I know it got faster, but I do not know why adding the items first in a list and adding to context by AddRange() was so much faster.

What is the explanation for this?

    
asked by anonymous 11.12.2013 / 18:36

2 answers

15

Considering that you are using the Entity Framework 6.

What happens is that during AddRange() automatic change checking is disabled, unlike what happens with Add() . Try deactivating the check and redo your test using Add() :

context.Configuration.AutoDetectChangesEnabled = false;

You can find more this MSDN article .

    
11.12.2013 / 18:48
-1

I do not have specific knowledge about the Entity Framework, but from what I can see, the use of an operation AddRange , instead of multiple operations Add implies a much better performance due to the cost of performing synchronous operations disk.

Due to the way hard disks work, a write operation takes a much longer time than a write operation in RAM. (A hard disk operates on the millisecond time scale, while RAM operates on the nanosecond time scale, about 1 million times faster.)

This time can be regarded as a kind of latency , for (roughly) is independent of the amount of data that you are recording, and is related to the time required to bring the data on the bus ( bus ), send the commands to the disk, position the read head, effectively write the data and then receive a response that the data has been successfully written.

The Add and AddRange functions are probably waiting for the data to be written to disk only after it returns, because they are probably synchronous. The time difference between recording 1 record or 3,000 records on a disc is probably not very significant, so it's much better to record all records at once than to do 3,000 operations.

An interesting analogy is this: Suppose you have to send 100 people from Rio de Janeiro to São Paulo. It's much better to wait for everyone to board the same bus and make the trip together, than to send one at a time, and only send the next one when you receive a phone call telling you that the previous one has arrived.

Performance improvement is even more dramatic if you engage in Network communication with a remote database server.

I recommend experimenting with different sizes of List<StatusRecursos> to find the optimal balance of memory expenditure and performance. It is possible that the difference from 1 to 100 records at a time is dramatic, but the difference from 100 to 3000 records at a time is not as significant. In this case it is better to save memory, especially if your application is to run on mobile platforms. Another alternative is to check if the library has asynchronous recording functions.

    
11.12.2013 / 19:03