How to merge multiple text files into one?

9

Does anyone know how to select all text files from the same directory and merge all of them into just one final text file?

Example: In the X folder, I have the 1.txt, 2.txt, and 3.txt files. I need to merge the contents of all into just one text file.

I tried this code, which compiles but when it executes, an exception of type IndexOutofRange is raised.

string[] stringArray = Directory.GetFiles(@"C:\InventX", "*.txt");
        System.Text.StringBuilder stringBuilder = new System.Text.StringBuilder();
        for (int i = 0; i <= stringArray.Count(); i++)
        {
            stringBuilder.Append(System.IO.File.ReadAllText(stringArray[i]));
        }
        string bulidOutput = stringBuilder.ToString();
        string newFilePath = @"C:\Lala.txt";
        System.IO.File.WriteAllText(newFilePath, bulidOutput);
    
asked by anonymous 22.05.2014 / 16:15

4 answers

10

The error in your code is due to this condition:

for (int i = 0; i <= stringArray.Count(); i++)

should be

for (int i = 0; i < stringArray.Count(); i++)

As it is, in the last iteration, when i == stringArray.Count() and since arrays are zero index will raise the IndexOutOfRangeException exception.

To add, an efficient way to merge files is to read them bit by bit and to write as each bit is read. You can change the buffer size and compare performance against performance to see which fits your scenario better.

public void UnirFicheiros(string directorio, string filtro, string ficheiroUnido)
{
    if (Directory.Exists(directorio))
        throw new DirectoryNotFoundException();

    const int bufferSize = 1 * 1024;
    using (var outputFile = File.Create(Path.Combine(directorio, ficheiroUnido)))
    {
        foreach (string file in Directory.GetFiles(directorio, filtro))
        {
            using (var inputFile = File.OpenRead(file))
            {
                var buffer = new byte[bufferSize];
                int bytesRead;
                while ((bytesRead = inputFile.Read(buffer, 0, buffer.Length)) > 0)
                {
                    outputFile.Write(buffer, 0, bytesRead);
                }
            }
        }
    }
}
    
22.05.2014 / 16:41
9

Here's a simple example:

static void Main(string[] args)
{
    string diretorio = @"C:\teste";

    String[] listaDeArquivos = Directory.GetFiles(diretorio);

    if (listaDeArquivos.Length > 0)
    {
        string caminhoArquivoDestino = @"C:\teste\saida.txt";

        FileStream arquivoDestino = File.Open(caminhoArquivoDestino, FileMode.OpenOrCreate);
        arquivoDestino.Close();

        List<String> linhasDestino = new List<string>();

        foreach (String caminhoArquivo in listaDeArquivos)
        {
            linhasDestino.AddRange(File.ReadAllLines(caminhoArquivo));
        }

        File.WriteAllLines(caminhoArquivoDestino, linhasDestino.ToArray());
    }

}

Play with the methods and suit your needs.

    
22.05.2014 / 16:29
8

As the approach does not look good, I decided to make a computable example that solved the problem in a generic way.

using System;
using System.IO;
using Util.IO;

public class MergeFiles {
    public static void Main(string[] args) {
        int bufferSize;
        FileUtil.MergeTextFiles(args[0], args[1], args[2], (int.TryParse(args[3], out bufferSize) ? bufferSize : 0));
    }
}

namespace Util.IO {
    public static class FileUtil {
        public static void MergeTextFiles(string targetFileName, string sourcePath, string searchPattern = "*.*", int bufferSize = 0) {
        if (string.IsNullOrEmpty(sourcePath)) {
            sourcePath = Directory.GetCurrentDirectory();
        }
            if (targetFileName.IndexOfAny(System.IO.Path.GetInvalidPathChars()) != -1) {
                throw new ArgumentException("Diretório fonte especificado contém caracteres inválidos", "sourcePath");
            }
            if (string.IsNullOrEmpty(targetFileName)) {
                throw new ArgumentException("Nome do arquivo destino precisa ser especificado", "targetFileName");
            }
            if (string.IsNullOrEmpty(targetFileName)) {
                throw new ArgumentException("Nome do arquivo destino precisa ser especificado", "targetFileName");
            }
            if (targetFileName.IndexOfAny(System.IO.Path.GetInvalidFileNameChars()) != -1) {
                throw new ArgumentException("Nome do arquivo destino contém caracteres inválidos", "targetFileName");
            }
            var targetFullFileName = Path.Combine(sourcePath, targetFileName);
            if (bufferSize == 0) {
                File.Delete(targetFullFileName);
                foreach (var file in Directory.GetFiles(sourcePath, searchPattern)) {
                    if (file != targetFullFileName) {
                        File.AppendAllText(targetFullFileName, File.ReadAllText(file));
                    }
                }
            } else {
                using (var targetFile = File.Create(targetFullFileName, bufferSize)) {
                    foreach (var file in Directory.GetFiles(sourcePath, searchPattern)) {
                        if (file != targetFullFileName) {
                            using (var sourceFile = File.OpenRead(file))    {
                                var buffer = new byte[bufferSize];
                                int bytesRead;
                                while ((bytesRead = sourceFile.Read(buffer, 0, buffer.Length)) > 0) {
                                    targetFile.Write(buffer, 0, bytesRead);
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

I placed it on GitHub for future reference .

The Main() method is there just to facilitate a quick test, it is not in production conditions. The MergeTextFiles() method is quite reasonable for use. It's not 100%, I did not do a test unit for it, I did not document it, I did not think of all the possible situations, but it's already well underway.

You can choose a buffer size if you want to control the copy mode better. If you think you'll never need this, you can take this part of the method. But it does not hurt to leave, since the default is to make the full copy of the files within the criteria of the current .Net implementation.

Possible improvements

Some improvements can be made to make it more generic or add functionality. You could, for example, put a last parameter parameter extraNewLineOptions extraNewLineOption = extraNewLineOptions.NoExtraNewLine and an enumeration enum extraNewLineOptions { NoExtraNewLine, SelectiveExtraNewLine, AlwaysExtraNewLine } .

To allow an extra line break to be placed at the end of each file to ensure that it does not encode text. This can be useful but in most cases it is not necessary, so it would be disabled by default . I leave it to the creativity of each one to implement this, mainly by the SelectiveExtraNewLine() that would only put a line break if it does not exist at the end of the file, it is not so trivial to implement. You can create an overload to improve the use of the parameters.

Another improvement is to allow copying to be done asynchronously. Very useful if you have large volumes of files.

And the method could be breaking into parts as well.

Depending on the version of .NET

I used features to run in almost any version of .NET. If it is guaranteed that it will be used in later versions, it is possible to change the parameter checks by% with%. Or even you can remove all this since the verification of all these problems are also done in the called methods. Of course you would lose the locale of the information from where exactly the error originated.

Unfortunately there is no public method to check the validity of the joker in advance. But if necessary you can check how it is implemented in .NET sources (and possibly in the Mono fonts as well.

If you have C # 6 (through Roslyn ), some improvements may be made.

You could use a Contract.Requires() and then call the method directly: using Util.IO.FileUtil; .

In addition, the MergeTextFiles("combo.txt", ".", "*.txt") statements in the int bufferSize; and Main() method could be made inline during their use during int bytesRead; and TryParse() respectively: while e int.TryParse(args[3], out var bufferSize .

See the C # 6 example in .NET Fiddle . And at Coding Ground . Also I put it in GitHub for future reference .

    
22.05.2014 / 20:19
4

With StreamWriter

String[] arquivos = Directory.GetFiles(@".\Txts", "*.txt");
StreamWriter strWriter = new StreamWriter(".\Final.txt");
foreach (String arquivo in arquivos)
{
    strWriter.WriteLine(File.ReadAllText(arquivo));
}
strWriter.Flush();
strWriter.Dispose();

Reference:

22.05.2014 / 16:42