Pandas Conditional Sum

1

Hello. I have the following situation

 df1 = pd.DataFrame({'Key':['a','b','c','a','c','a','b','c'],'Value':[9.2,8.6,7.2,8.3,8.5,2.1,7.4,1.1]})
 df2 = pd.DataFrame({'Key':['a','b','c']})

And I would like the following answer

in [0]: df2
out[0]: 
  Key  soma
0   a  19.6
1   b  16.0
2   c  16.8

The only way I know it is this:

for ind,row in df2.iterrows():
        df2.soma[ind] = df1.loc[df1.Key == row.Key, 'Value'].sum()

But it takes a lot of time that makes my execution unfeasible, because it is a very large amount of data.

Abc to all

    
asked by anonymous 30.11.2016 / 11:41

1 answer

0

Depending on the response from SOen , another possible way to get the sum column is to eliminate the looping and use a groupby (aggregation) to create the new column:

df2['soma'] = df1.groupby('Key')["Value"].transform(np.sum)

After execution:

In [35]: df2
Out[35]:
  Key  soma
0   a  19.6
1   b  16.0
2   c  16.8

If you are not using the numpy library (recommended), replace np.sum with sum .

    
30.11.2016 / 13:49