Merge CSV with Python

6

I have a collection of dozens of CSV files. Most of them share the same fields, but some have unique fields. I want to merge them by using Python in a single CSV file with a global header that includes all the fields of all the columns. I'm using the CSV library, but so far unsuccessful, because the data does not end up in the right place.

    
asked by anonymous 11.06.2015 / 21:55

1 answer

7

I had a similar problem some time ago. I adjusted a little to your needs. You may have to change some things, especially at the delimiter level.

from glob import glob
import csv

"""
    este programa tem de ser excutado da directoria onde estão os csv.
    o output vai para o ficheiro consolidated.csv
"""
def create_global_header(files):
    """
        criar os cabeçalhos com todos os headers dos csv.
    """
    consolidated_header = ['filename']
    for file in files:
        with open(file, 'r') as icsv:
            reader = csv.DictReader(icsv, dialect = 'excel', delimiter=';')
            for field in reader.fieldnames:
                if field not in consolidated_header:
                    consolidated_header.append(field)
    return consolidated_header

def global_csv(ifile, global_header, ofile):
    """
    le o ficheiro csv ifile, e bota para o ficheiro ofile.
    uma vez que o DictWriter e DictReader sao usados, e o cabeçalho
    é comum aos dois ficheiros, os dados sabem para que campo devem ir.
"""
    with open(ofile, 'a' ) as ocsv, open(ifile, 'r') as icsv:
        ireader = csv.DictReader(icsv, dialect='excel', delimiter=';' )
        owriter = csv.DictWriter(ocsv, global_header, dialect='excel', delimiter=';')
        for i, row in enumerate(ireader):
            row['filename']= ifile
            owriter.writerow(row)


if __name__ == '__main__':
    files = glob('*.csv')
    global_header = create_global_header(files)
    with open("consolidated.csv", 'w') as mycsv:
        writer = csv.DictWriter(mycsv, global_header, dialect='excel', delimiter=';')
        writer.writeheader()
    for file in files:
        if file != 'consolidated.csv':
            global_csv(file, global_header, 'consolidated.csv')
    
11.06.2015 / 22:51