Problem concatenating csv files

0

I'm trying to concatenate one csv file with another. My goal is to extract data from an HTML every day and my routine should get a csv file called 'old_date' where a saved dataframe is located in csv, and when I run again I should create a new updated file and concatenate this new file with the old one . After this happens it should delete the repeated data and adding only the new ones to the csv file, creating a new 'old_date' so that the routine will run again tomorrow. I'm using:

#a.to_csv('dado_antigo.csv')
b = pd.read_csv('dado_antigo.csv', 
                index_col='Data',
                parse_dates= ['Data'])
#arquivo concatenado
c = pd.concat((b,a))
aa, bb = np.unique(c, return_index=True)
c = c.ix[bb]
c = pd.read_csv('dado_antigo.csv')

And I get this error:

  

IndexError: indices are out-of-bounds

How could you solve it? Thank you.

    
asked by anonymous 04.04.2017 / 16:10

1 answer

1

Based on pandas version 0.20.1, there is a function called pandas.DataFrame.drop_duplicates here at documentation that can help you.

You can do this, for example:

df1 = pd.DataFrame(data=[['1', '2'], ['3', '4'], ['1', '2']], columns=['A', 'B'])

df2 = pd.DataFrame(data=[['5', '6'], ['7', '8'], ['1', '2']], columns=['A', 'B'])

res = pd.concat([df1, df2], axis=0)

res = res.drop_duplicates().reset_index(drop=True)

The result in res should contain what you need.

Caution: .reset_index(drop=True) is not necessary, but I advise strongly, because without it your frame will have the indexes out of order and this can cause you problems depending on what you want to do then.

I hope I have helped.

    
14.05.2017 / 21:54