I am creating a code to read several CSV files and extract some parameters and mount a new dataframe with pandas, however I am facing a problem in this construction.
Initially, I wanted to create an empty dataframe and as I read the CSVs I would add the desired rows and columns.
For example. Let's say I initially have the df empty. After reading my first CSV and adding it to df, I have:
df = pd.DataFrame(columns = ['01/05/2017','01/05/2018','01/05/2019'], index = [0], data=[0,10,11])
'01/05/2017' '01/05/2018' '01/05/2019'
'Ana' 0 10 11
After scanning the second CSV, my df
would be:
'01/05/2017' '01/05/2018' '01/05/2019' '10/06/2009'
'Ana' 0 10 11 nan
'Joao' 5 11 nan 5
In such a way that to the end of several CSVs later I had a df as long and complete as I need.
I tried to form N different dfs and go adding, but it did not work as I wanted. One of the reasons is that, if by chance the data of 'Joao' are distributed in more than 1 csv, the df would be:
'01/05/2017' '01/05/2018' '01/05/2019' '10/06/2009'
'Ana' 0 10 11 nan
'Joao' nan nan nan 5
'Joao' 5 nan nan nan
'Joao' nan 11 nan nan
That's not the format I want from the data.
Is there any way to compose information as desire?