Adding a new data to an empty pandas dataframe

4

I am creating a code to read several CSV files and extract some parameters and mount a new dataframe with pandas, however I am facing a problem in this construction.

Initially, I wanted to create an empty dataframe and as I read the CSVs I would add the desired rows and columns.

For example. Let's say I initially have the df empty. After reading my first CSV and adding it to df, I have:

df = pd.DataFrame(columns = ['01/05/2017','01/05/2018','01/05/2019'], index = [0], data=[0,10,11])
          '01/05/2017' '01/05/2018' '01/05/2019'
'Ana'      0            10           11

After scanning the second CSV, my df would be:

          '01/05/2017' '01/05/2018' '01/05/2019' '10/06/2009'
'Ana'      0            10           11           nan
'Joao'     5            11           nan          5

In such a way that to the end of several CSVs later I had a df as long and complete as I need.

I tried to form N different dfs and go adding, but it did not work as I wanted. One of the reasons is that, if by chance the data of 'Joao' are distributed in more than 1 csv, the df would be:

          '01/05/2017' '01/05/2018' '01/05/2019' '10/06/2009'
'Ana'      0            10           11           nan
'Joao'     nan          nan          nan          5
'Joao'     5            nan          nan          nan
'Joao'     nan          11           nan          nan

That's not the format I want from the data.

Is there any way to compose information as desire?

    
asked by anonymous 18.11.2018 / 19:36

1 answer

0

It seems simple to solve. Assuming you've swept all your files and inserted all the rows into the Dataframe. Just use the code:

df.groupby(df.index).sum()

Example

import pandas as pd
import numpy as np

data = np.array([[0,10,11,np.nan],
                [np.nan,np.nan,np.nan,5],
                [5,np.nan,6,np.nan],
                [np.nan,11,np.nan,np.nan]])

df = pd.DataFrame(data, columns=['01/05/2017','01/05/2018','01/05/2019','10/06/2009'], index=['Ana','Joao','Joao','Joao'])

Dataframe:

        01/05/2017  01/05/2018  01/05/2019  10/06/2009
Ana     0.0         10.0        11.0        NaN
Joao    NaN         NaN         NaN         5.0
Joao    5.0         NaN         6.0         NaN
Joao    NaN         11.0        NaN         NaN

Using Groupby

df.groupby(df.index).sum()

Output:

        01/05/2017  01/05/2018  01/05/2019  10/06/2009
Ana     0.0         10.0        11.0        0.0
Joao    5.0         11.0        6.0         5.0
    
19.11.2018 / 05:51