In R, how to calculate the average of one column based on another?

3

I have the following XLSX table:

I need to calculate the SP500 column average for each year. I did with 'mean (name $ SP500)' and found the total average. Now I need the average for each year. Anyone know how I can do it? Thanks!

    
asked by anonymous 20.06.2018 / 18:09

2 answers

3

There are several ways to do what you want.

But the data first.

set.seed(941)    # Torna os resultados reprodutíveis

Data <- c("3/1/2005", "4/1/2005", "5/1/2005", "6/1/2005",
          "14/2/2006", "15/2/2006", "16/2/2006", "17/2/2006",
          "6/3/2007", "7/3/2007", "8/3/2007", "9/3/2007",
          "13/2/2008", "14/2/2008", "15/2/2008", "16/2/2008")
SP500 <- round(runif(length(Data), 900, 1400), 2)

nomedatabela <- data.frame(Data, SP500)

Since we are going to need the year, the column Data is transformed into a class Date vector, with the base function as.Date . Then to get ano , I'll use the lubridate package.

nomedatabela$Data <- as.Date(nomedatabela$Data, "%d/%m/%Y")
ano <- lubridate::year(nomedatabela$Data)

Now the averages.

The tapply function groups the first argument by a factor and calculates a value of the FUN function for each group. The result is a vector.

tapply(nomedatabela$SP500, ano, FUN = mean, na.rm = TRUE)
#    2005     2006     2007     2008 
#1213.207 1092.210 1174.513 1124.352

The aggregate function also groups and calculates a value of the FUN function for each group but the result is a data.frame class object. To define the groups, I'll use the formulas interface of aggregate .

aggregate(SP500 ~ ano, nomedatabela, FUN = mean, na.rm = TRUE)
#   ano    SP500
#1 2005 1213.207
#2 2006 1092.210
#3 2007 1174.513
#4 2008 1124.352
    
20.06.2018 / 20:31
2

I'm creating a table to explain what I'm doing

Data <- c("3/1/2005", "4/1/2005", "5/1/2005", "6/1/2005",
      "14/2/2006", "15/2/2006", "16/2/2006", "17/2/2006",
      "6/3/2007", "7/3/2007", "8/3/2007", "9/3/2007",
      "13/2/2008", "14/2/2008", "15/2/2008", "16/2/2008")
SP500 <- runif(length(Data), 900, 1400)

Instead of doing t < - data.frame (Data, SP500) read the given base using read.alguma (depending on the length you are using).

From here down I'm using dplyr package

library(dplyr)
 t <- tibble(Data, SP500)

I'm adding the year

a<- as.Date(t$Data, "%d/%m/%Y")
ano <- tibble(ano=lubridate::year(a))

Finding the mean requested

base <- cbind(t,ano)   
base2 <- base %>% group_by(ano) %>% summarise(media=mean(SP500))

The only difference is between my solution and the solution from above and that I'm using the dplyr package to mediate and create tibbles.

    
21.06.2018 / 01:09