Weighted Average in R

3

With the dataset I have the columns c1 (year), c2 (states), c3 (weight) and c4 (value). I would like to make the weighted average of (c4) with (c3), by state (c2) per year (c1).

 dados <- data.frame(c1 = c(rep(1996:2003, 4)), c2 = c(rep('rs', 8), rep('sc', 8)),
                     c3 = 1:32, c4 = 32:1)
 dados
      c1 c2 c3 c4
 1  1996 rs  1 32
 2  1997 rs  2 31
 3  1998 rs  3 30
 4  1999 rs  4 29
 5  2000 rs  5 28
 6  2001 rs  6 27
 7  2002 rs  7 26
 8  2003 rs  8 25
 9  1996 sc  9 24
 10 1997 sc 10 23
 11 1998 sc 11 22
 12 1999 sc 12 21
 13 2000 sc 13 20
 14 2001 sc 14 19
 15 2002 sc 15 18
 16 2003 sc 16 17
 17 1996 rs 17 16
 18 1997 rs 18 15
 19 1998 rs 19 14
 20 1999 rs 20 13
 21 2000 rs 21 12
 22 2001 rs 22 11
 23 2002 rs 23 10
 24 2003 rs 24  9
 25 1996 sc 25  8
 26 1997 sc 26  7
 27 1998 sc 27  6
 28 1999 sc 28  5
 29 2000 sc 29  4
 30 2001 sc 30  3
 31 2002 sc 31  2
 32 2003 sc 32  1
 > 
    
asked by anonymous 06.12.2018 / 02:20

2 answers

3

I do not know if I understand you very well. Here is a solution that calculates by year and state the average of c4 weighted by c3. Is that right?

library(tidyverse)
dados %>% 
  group_by(c1, c2) %>% 
  summarise(media = weighted.mean(c4, c3))

# # A tibble: 16 x 3
# # Groups:   c1 [?]
# c1 c2    media
# <int> <fct> <dbl>
# 1  1996 rs    16.9 
# 2  1996 sc    12.2 
# 3  1997 rs    16.6 
# 4  1997 sc    11.4 
# 5  1998 rs    16.2 
# 6  1998 sc    10.6 
# 7  1999 rs    15.7 
# 8  1999 sc     9.8 
# 9  2000 rs    15.1 
# 10  2000 sc     8.95
# 11  2001 rs    14.4 
# 12  2001 sc     8.09
# 13  2002 rs    13.7 
# 14  2002 sc     7.22
# 15  2003 rs    13   
# 16  2003 sc     6.33
    
06.12.2018 / 17:13
2

Here are two ways to calculate the averages for groups of c1 and c2 .

Base R
The aggregate function is ideal for this. It's simple, it solves the problem in a line of code.

aggregate(c3 ~ c1 + c2, dados, mean, na.rm = TRUE)
#     c1 c2 c3
#1  1996 rs 24
#2  1997 rs 23
#3  1998 rs 22
#4  1999 rs 21
#5  2000 rs 20
#6  2001 rs 19
#7  2002 rs 18
#8  2003 rs 17
#9  1996 sc 16
#10 1997 sc 15
#11 1998 sc 14
#12 1999 sc 13
#13 2000 sc 12
#14 2001 sc 11
#15 2002 sc 10
#16 2003 sc  9

Package dplyr .

The dplyr package is now a standard in R, at least for the growing number of tidyverse supporters.

library(dplyr)

dados %>%
  group_by(c1, c2) %>%
  mutate(n = n(),
         média = mean(c3, na.rm = TRUE))
## A tibble: 32 x 5
## Groups:   c1, c2 [16]
#      c1 c2       c3     n média
#   <int> <fct> <int> <int> <dbl>
# 1  1996 rs       32     2    24
# 2  1997 rs       31     2    23
# 3  1998 rs       30     2    22
# 4  1999 rs       29     2    21
# 5  2000 rs       28     2    20
# 6  2001 rs       27     2    19
# 7  2002 rs       26     2    18
# 8  2003 rs       25     2    17
# 9  1996 sc       24     2    16
#10  1997 sc       23     2    15
## ... with 22 more rows
    
06.12.2018 / 08:06