What can cause EF performance to fall in this scenario?

6

I've been doing a data import for a project in ASP.NET MVC 5 with EF 6.1 and SQL Server 2008 where the source Import data was a txt file.

The file record lines were not very large but contained a certain amount of information to process. For example, in the record line there was the data of an employee, a position in the company, neighborhood, city, etc. Everything becoming a table in my template.

So, line by line I checked the existence of each neighborhood, city, post and all other data to get the identification of those who were already registered or else to record the ones that did not exist yet.

I later chose to cache. I simply loaded the data into generic lists into memory and before going into the database I checked this information in memory.

The other information did not generate a large number of data, but the employee registration did. It was more than 27,000 records.

Anyway, my question is about a performance problem I had when doing the import using EF . After the first thousand records (a bit more) import performance falls a lot, registering an employee every 1 or 2 seconds.

Until I realized that by canceling the import process and starting over again the first thousand records were imported in a good time, but after a thousand records things started to go back slowly.

I resolved by destroying my context every thousand records and re-creating. Dai then the import had a good performance from the beginning to the end.

I would like to hear from you that you have been through something similar, why does this occur? Can it be some configuration of the Entity Framework? Mine is without settings.

Here is the import code snippet for anyone to review:

private List<Domain.int> Matriculas;
private List<Domain.CentroCusto> CacheCentrosCusto;
private List<Domain.Local> CacheLocais;
private List<Domain.Profissao> CacheProfissoes;
private List<Domain.Cidade> CacheCidades;
private List<Domain.Bairro> CacheBairros;
//Funcionarios
public ActionResult Funcionarios(string filePath)
{
    var fileInfo = new FileInfo(filePath);
    if (!file.Exists)
        throw new Exception("Arquivo não encontrado");

    Matriculas = contexto.Funcionarios
        .Select(x => x.Matricula)
        .ToList();

    CacheCentrosCusto = contexto.CentrosCusto.ToList();
    CacheLocais = contexto.Locais.ToList();
    CacheProfissoes = contexto.Profissoes.ToList();
    CacheCidades = contexto.Cidades.ToList();
    CacheBairros = contexto.Bairros.ToList();

    var file = new System.IO.StreamReader(filePath);
    try
    {
        var count = 0;
        string line = "";
        while ((line = file.ReadLine()) != null)
        {
            string[] campos = line.Split(';');

            int matricula;
            if (int.TryParse(campos[0].Trim(), out matricula))
            {
                if (Matriculas.Contains(matricula))
                    continue;
            }

            var funcionario = new Domain.Funcionario();

            funcionario.Matricula = matricula;
            // obtendo outros dados ...

            funcionario.CentroCustoId = GetCentroCustoId(campos[34].Trim());
            funcionario.LocalId = GetLocalId(campos[35].Trim());
            funcionario.ProfissaoId = GetProfissaoId(campos[36].Trim());

            //Cidade e Bairro
            funcionario.BairroId = null;

            var bai_desc = campos[11].Trim();
            if (!string.IsNullOrEmpty(bai_desc))
            {
                var cid_uf = "";
                var cid_desc = campos[12].Trim();

                // trabalho a string ...

                var cidade = new Domain.Cidade();
                cidade.Nome = cid_desc;
                cidade.Uf = cid_uf;

                var bairro = new Domain.Bairro();
                bairro.Nome = bai_desc;

                funcionario.BairroId = GetBairroId(bairro, cidade);
            }

            try
            {
                contexto.Funcionarios.Add(funcionario);
                contexto.SaveChanges();

                Matriculas.Add(matricula);

                count++;
                if (count == 1000)
                {
                    count = 0;
                    contexto.Dispose();
                    contexto = new DataAccess.Context();
                }
            }
            catch (DbEntityValidationException e) { ... }
            catch (Exception e) { ... }
        }
    }
    finally
    {
        file.Close();
    }

    return RedirectToAction("Index");
}

My context is created in the constructor of the Controller class and destroyed in the Dispose method. Note that there is a counter and every thousand records I destroy and re-create my context.

An example for the other methods present: OBS : All other methods contained in the import code have the same structure as this.

private int? GetProfissaoId(string descricao)
{
    int? profissao_id = null;
    if (!string.IsNullOrEmpty(descricao))
    {
        var profissaoTemp = CacheProfissoes
            .SingleOrDefault(x => x.Descricao.ToLower().Trim() == descricao.ToLower().Trim());

        if (profissaoTemp == null)
        {
            try
            {
                contexto.Profissoes.Add(profissao);
                contexto.SaveChanges();

                CacheProfissoes.Add(profissao);

                profissao_id = profissao.Id;
            }
            catch (DbEntityValidationException e) { ... }
            catch (Exception e) { ... }

            profissao_id = profissao.Id;
        }
        else
        {
            profissao_id = profissaoTemp.Id;
        }

    }
    return profissao_id;
}
    
asked by anonymous 20.06.2014 / 03:11

1 answer

5

This performance problem is well known. By adding the objects one by one, you force the Entity Framework to check the states of all objects attached to the context, which is bad for performance.

There are a few ways to solve. I'll introduce them and you'll choose which one is best for you:

1. Turn off AutoDetectChangesEnabled

Simply do the following:

context.Configuration.AutoDetectChangesEnabled = false;

This should end with checking the data between all entities attached to the context.

2. Decrease batch for 100 records

Remember that you are entering not just employees, but cities and neighborhoods, which makes the unit of work having up to 3000 records. Leave the number of records per batch shorter and perform SaveChanges() before doing Dispose() , just to ensure that nothing is lost:

if (count == 100)
{
    count = 0;
    contexto.SaveChanges();
    contexto.Dispose();
    contexto = new DataAccess.Context();
}

3. Use the Caches separately; AddRange() to each batch

Create separate lists for your caches, enter them normally, and at the end of each batch enter all elements at once:

context.Funcionarios.AddRange(CacheFuncionarios);
context.Cidades.AddRange(CacheCidades);
...
context.SaveChanges();

You can even use all the alternatives presented at the same time, but I recommend implementing one by one and verifying performance. Each case can make the code behave quite differently from one to the other.

See more at link

UPDATE

An update of this response is required because new alternatives appeared to solve the bulk insert problem using the Entity Framework.

One of these is EntityFramework.BulkInsert ( which has a separate package for version 5 and another one for version 4 ). There are other versions for other databases, such as Microsoft SQL Server Compact Edition .

The first part answer is still useful for other types of performance improvement, but no longer so useful for the case of bulk insert .

    
20.06.2014 / 18:06