How to construct time series with frequencies different from the original?

4

I have a dataframe with daily precipitation data, with dates from 01/01/1900 until 12/31/2010, example:

# Data             Est_1      Est_2      Est_3   
# 17/12/2010          NA          0          0   
# 18/12/2010          NA          0          0    
# 19/12/2010          NA        1.7          0     
# 20/12/2010          NA        1.1       37.2    
# 21/12/2010          NA       88.5         50   
# 22/12/2010          NA        30           0 

I want to extract some information from this dataframe, as sub-series containing the information:

  • minimum annual daily precipitation
  • average annual daily precipitation
  • maximum annual daily precipitation

How to do this in R or Python?

    
asked by anonymous 15.09.2015 / 02:08

3 answers

4

In R, you can use the lubridate package, which will make it much easier to manipulate dates together with dplyr .

See an example:

library(lubridate)

dados <- data.frame(
  data = seq(dmy('01/01/1900'),dmy('31/12/2010'), by = '1 day'),
  valor = 1:40542
  )

Calculating the measurements for year :

> library(dplyr)
> dados %>% 
+   group_by(year(data)) %>% 
+   summarise(media = mean(valor), minimo = min(valor), maximo = max(valor))
Source: local data frame [111 x 4]

   year(data)  media minimo maximo
1        1900  183.0      1    365
2        1901  548.0    366    730
3        1902  913.0    731   1095
4        1903 1278.0   1096   1460
5        1904 1643.5   1461   1826
6        1905 2009.0   1827   2191
7        1906 2374.0   2192   2556
8        1907 2739.0   2557   2921
9        1908 3104.5   2922   3287
10       1909 3470.0   3288   3652
..        ...    ...    ...    ...

Calculating by month of the year :

> dados %>% group_by(year(data), month(data)) %>% 
+   summarise(media = mean(valor), min = min(valor), maximo = max(valor))
Source: local data frame [1,332 x 5]
Groups: year(data)

   year(data) month(data) media min maximo
1        1900           1  16.0   1     31
2        1900           2  45.5  32     59
3        1900           3  75.0  60     90
4        1900           4 105.5  91    120
5        1900           5 136.0 121    151
6        1900           6 166.5 152    181
7        1900           7 197.0 182    212
8        1900           8 228.0 213    243
9        1900           9 258.5 244    273
10       1900          10 289.0 274    304
..        ...         ...   ... ...    ...

See all the elements of a date that you can extract:

Inthislinkyouhaveadetailedexplanationoflubridate: link

    
15.09.2015 / 19:20
2

Or if you prefer the data.table package and a little regex:)

library(data.table)
library(lubridate)
library(stringr)

dTbl = data.table(data=seq(dmy('01/01/1900'),
                           dmy('31/12/2010'),
                           by='1 day'),
                  valor=1:40542)

dTbl[, year := str_extract(data, perl('^[0-9]+(?=-)'))]
dTbl[, month := str_extract(data, perl('(?<=-)[0-9]+(?=-)'))]

dTbl[, .(median=median(as.numeric(valor)),
         mean=mean(valor),
         min=min(valor),
         max=max(valor)), by=year]

dTbl[, .(median=median(as.numeric(valor)),
         mean=mean(valor),
         min=min(valor),
         max=max(valor)), by=.(year, month)]
    
16.09.2015 / 03:51
1

In Python you can use the "Pandas" package for manipulating dataframes.

In this case, I suggest grouping by year and then picking up the information you need.

For more examples and reference, see the link: link

    
15.09.2015 / 12:56