Read multiple lines of a file in parallel with C #

0

I have a file with almost 700mb that has numerous lines with Json inside it. I need to treat each json line-by-line and insert them into my database.

The question is, I'm currently using the following code:

 using (StreamReader arquivo = new StreamReader(System.IO.File.OpenRead(file), Encoding.UTF8))
   {
       while (arquivo.Peek() > -1)
        {
            //tratamento do arquivo.
        }
   }

How can I read the lines in parallel to make the process faster?

    
asked by anonymous 25.02.2016 / 18:52

2 answers

1

As you have a text file, where lines may have different sizes, you will not have an efficient way to read the file in parallel.

What you can do, however, is to read the lines sequentially, and perform their processing in parallel. For example, you can use the thread pool of System.Threading, or use a pool of your own, where you would place the rows to be processed in a queue, and as long as there is a free thread, it would pick up the next thread to be processed:

public void ProcessaArquivo(string file)
{
    using (StreamReader arquivo = File.OpenText(file))
    {
        string linha;
        while ((linha = arquivo.ReadLine()) != null)
        {
            ThreadPool.QueueUserWorkItem(ProcessaLinha, linha);
        }
    }
}

private void ProcessaLinha(object parametro) {
    string json = (string)parametro;
    // realiza o processamento
}
    
25.02.2016 / 19:06
0

Basically you create a queue for the rows that were read in the file, and then you can create multiple threads to process them, reading the rows is fast, the process in the BD is slower then ...

p>

The following code example:

        Queue<string> linhas;
        private void LerLinhas()
        {
            linhas = new Queue<string>();
            string linha = null;
            StreamReader reader = new StreamReader("Arquivo", Encoding.Default);
            while ((linha = reader.ReadLine()) != null)
            {
                linhas.Enqueue(linha);
            }
            reader.Close();


        }

        private void Processa()
        {
            if (linhas != null)
                while (linhas.Count > 0)
                {
                    string linha = linhas.Dequeue();

                    //Processa Linha, BD, Etc... 
                }
        }

        private void IniciaProcesso()
        {
            Thread tLerLinhas = new Thread(LerLinhas);
            tLerLinhas.Start();

            Thread.Sleep(1000);
            int nThreads = 5;
            for (int i =0; i<nThreads;i++ )
            {
                Thread t = new Thread(Processa);
                t.Start();
            }

        }

Just call the StartupProcess () method, it will start 5 threads processing the lines. You can change the amount of threads, remembering that several may even make processing overexchange context worse.

    
02.05.2017 / 21:29