I have 4 folders, and each one of them is stuffed with csvs of 3 types (ap, peers, visits).
I'm a beginner in python, but I wanted to create a python script that would merge the files that are peer, so I got a single file with the lines of all the peer files found. In addition, I wanted to add a column to the header called "student", and for each line I wrote in the final peer file I would put the respective student at the end.
mainfolder = sys.argv
mainfolder.pop(0)
mainfolder = mainfolder[0]
allfolders = glob.glob(mainfolder + '*\')
with open(mainfolder + "finalpeers\totalpeers.csv", "w") as finalPeersFile:
newpheader = '"_id","ssid","bssid","dateTime","latitude","longitude","student"\n'
finalPeersFile.write(newpheader)
for folder in allfolders:
student = folder.split('\')[-2]
filesTomerge = glob.glob(folder + '*.csv')
for filename in filesTomerge:
if (isPeers(filename)):
with open(filename, 'r') as p:
for line in p:
finalPeersFile.write(line)
My code even does this, but since the headers are the same and there are files that only have headers, I get lots of lines with repeated headers. Also I can not just get the header of the first line and add "student" because there is a new line "hidden", I think it is something particular of python. And although I have the student to add at the end of the line, I can not simply add it to a string (line + student).
Final file:
How can I delete duplicates or merge files so that I can not put the headers?
p.s .: Price is an excuse if you are asking a question that has already been asked (although I have searched a lot and none have helped me solve the problem).