How to group information in a data frame from missing data?

3

I need to exclude empty df rows from a 30-year time series, with three daily measurements for each variable. I already used the subset(x, ...) function that solves part of the problem. However, in some cases there is no recorded measurement, as noted in the "prec" column for the date "1961-08-21". In this case, I need to keep a line stating that no measurement was performed that day, that is, that I remain with NA. How can I do this?

date        id      prec    tair    tw      tmax    tmin
1961-08-21  83377   NA      22.6    14.1    27.9    NA
1961-08-21  83377   NA      23.8    15.2    NA      13.8
1961-08-21  83377   NA      24.2    15.4    NA      NA
1961-08-22  83377   NA      22.6    14.1    29.7    NA
1961-08-22  83377   0       24.8    14.6    NA      13.9
1961-08-22  83377   NA      27      16      NA      NA
1961-08-23  83377   NA      24.6    14      28.8    NA
1961-08-23  83377   1       19.8    14.6    NA      13.8
1961-08-23  83377   2       18.8    14.7    NA      13.6
    
asked by anonymous 27.03.2018 / 15:14

1 answer

3

You can solve this problem with the dplyr package:

dados <- structure(list(date = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L),
 .Label = c("1961-08-21", "1961-08-22", "1961-08-23"), class = "factor"), 
 id = c(83377L, 83377L, 83377L, 83377L, 83377L, 83377L, 83377L, 83377L, 83377L), 
 prec = c(NA, NA, NA, NA, 0L, NA, NA, 1L, 2L), 
 tair = c(22.6, 23.8, 24.2, 22.6, 24.8, 27, 24.6, 19.8, 18.8), 
 tw = c(14.1, 15.2, 15.4, 14.1, 14.6, 16, 14, 14.6, 14.7), 
 tmax = c(27.9, NA, NA, 29.7, NA, NA, 28.8, NA, NA), 
 tmin = c(NA, 13.8, NA, NA, 13.9, NA, NA, 13.8, 13.6)), 
 .Names = c("date", "id", "prec", "tair", "tw", "tmax", "tmin"), 
 class = "data.frame", 
 row.names = c(NA, -9L))

library(dplyr)

dados %>%
  group_by(date) %>%
  summarise_all(funs(Media=mean(., na.rm=TRUE)))
# A tibble: 3 x 7
  date       id_Media prec_Media tair_Media tw_Media tmax_Media tmin_Media
  <fct>         <dbl>      <dbl>      <dbl>    <dbl>      <dbl>      <dbl>
1 1961-08-21   83377.     NaN          23.5     14.9       27.9       13.8
2 1961-08-22   83377.       0.         24.8     14.9       29.7       13.9
3 1961-08-23   83377.       1.50       21.1     14.4       28.8       13.7      

Basically, I bundled the data according to the date and calculated the average of each of the other columns. Note that I also calculated the average of id , but as I imagine that id is the same for each date, either calculate this average or not.

    
27.03.2018 / 17:28