(Pandas) - Group and summarize by date

Question

(Pandas) - Group and summarize by date

Navigation

#1 by (2 votes)

1

Hello, I am a beginner in pandas and I have a problem that I did not find / understood how to solve in the documentation or other topics. Briefly I need to group the days of my database observations within five days, and for each interval calculate the average occurrence of accidents, I'm trying unsuccessfully something like:

df = df.groupby(pd.TimeGrouper('5D'))['Acidentes'].mean()

     Data       Hora    Acidentes    Vítimas ...
0  12/02/2017    00          0          0
1  12/02/2017    01          2          1
...
24 13/02/2017    00          1          0
25 13/02/2017    01          0          0 
...
95 30/04/2017    23          3          2

These occurrences are recorded by day and by hour, but the intention is to group by a range of days and then average the number of accidents for each interval.

python-3.x pandas

asked by anonymous 21.08.2018 / 05:34

1 answer

How to update a user field of V10 extensibility sales lines? Check if double variable is empty or numeric

score 2 · Answer 1

Given this DataFrame:

# -*- coding: utf-8 -*-
import pandas as pd

d = {'Data': ['01/02/2017','06/02/2017','03/02/2017','02/02/2017','01/02/2017'],
     'Acidentes': [0,2,1,0,1],
     'Vitimas': [0,1,0,0,2]}
df = pd.DataFrame(data=d)
df['Data'] = pd.to_datetime(df['Data'], format='%d/%m/%Y') #transformei em data
df = df.sort_values(['Data']) #ordenar para vizualizar melhor
>>> print df
   Acidentes       Data  Vitimas
0          0 2017-02-01        0
4          1 2017-02-01        2
3          0 2017-02-02        0
2          1 2017-02-03        0
1          2 2017-02-06        1

We can use the resample :

df = df.set_index('Data').resample('5D').mean()
>>> print df
            Acidentes  Vitimas
Data                          
2017-02-01        0.5      0.5
2017-02-06        2.0      1.0

[Edit]

Returning the dates to the original default:

df = df.reset_index()
df['Data'] = df['Data'].apply(lambda x: x.__format__('%d/%m/%Y'))
>>> print df
         Data  Acidentes  Vitimas
0  01/02/2017        0.5      0.5
1  06/02/2017        2.0      1.0