Python - how to merge multiple csv files

0

I have 4 folders, and each one of them is stuffed with csvs of 3 types (ap, peers, visits).

I'm a beginner in python, but I wanted to create a python script that would merge the files that are peer, so I got a single file with the lines of all the peer files found. In addition, I wanted to add a column to the header called "student", and for each line I wrote in the final peer file I would put the respective student at the end.

mainfolder = sys.argv
mainfolder.pop(0)
mainfolder = mainfolder[0]
allfolders = glob.glob(mainfolder + '*\')

with open(mainfolder + "finalpeers\totalpeers.csv", "w") as finalPeersFile:

    newpheader = '"_id","ssid","bssid","dateTime","latitude","longitude","student"\n'
    finalPeersFile.write(newpheader)

    for folder in allfolders:
        student = folder.split('\')[-2]
        filesTomerge = glob.glob(folder + '*.csv')

        for filename in filesTomerge:
            if (isPeers(filename)):
                with open(filename, 'r') as p:
                    for line in p:
                        finalPeersFile.write(line)

My code even does this, but since the headers are the same and there are files that only have headers, I get lots of lines with repeated headers. Also I can not just get the header of the first line and add "student" because there is a new line "hidden", I think it is something particular of python. And although I have the student to add at the end of the line, I can not simply add it to a string (line + student).

Final file:

How can I delete duplicates or merge files so that I can not put the headers?

p.s .: Price is an excuse if you are asking a question that has already been asked (although I have searched a lot and none have helped me solve the problem).

    
asked by anonymous 10.07.2017 / 18:48

1 answer

0

Hidden% wc can be removed from a string by means of new line method.

The header of the input files can be skipped (skipped) by calling the rstrip() method.

Let's see:

from os import listdir
from os.path import isfile, join

# Diretorio
diretorio="/tmp"

# Recupera lista de ficheiros CSV em um diretorio
ficheiros = [f for f in listdir(diretorio) if (isfile(join(diretorio, f)) and f.endswith('.csv')) ]

# Abre ficheiro de saida...
saida = open( "saida.csv", "a" )

# Para cada ficheiro...
for f in ficheiros:

    # Abra o ficheiro
    csv = open( f )

    # Ignora o header do CSV
    csv.next()

    # Calcula student...
    student = 1

    # Para cada uma das demais linhas no ficheiro...
    for linha in csv:
         linha = linha.rstrip() + ';' + str(student) + '\n';
         saida.write(linha)

    # Fecha ficheiro CSV de entrada
    csv.close()

# Fecha ficheiro CSV de saida
saida.close()
    
10.07.2017 / 22:54