I have the following XLSX table:
I need to calculate the SP500 column average for each year. I did with 'mean (name $ SP500)' and found the total average. Now I need the average for each year. Anyone know how I can do it? Thanks!
There are several ways to do what you want.
But the data first.
set.seed(941) # Torna os resultados reprodutíveis
Data <- c("3/1/2005", "4/1/2005", "5/1/2005", "6/1/2005",
"14/2/2006", "15/2/2006", "16/2/2006", "17/2/2006",
"6/3/2007", "7/3/2007", "8/3/2007", "9/3/2007",
"13/2/2008", "14/2/2008", "15/2/2008", "16/2/2008")
SP500 <- round(runif(length(Data), 900, 1400), 2)
nomedatabela <- data.frame(Data, SP500)
Since we are going to need the year, the column Data
is transformed into a class Date
vector, with the base function as.Date
. Then to get ano
, I'll use the lubridate
package.
nomedatabela$Data <- as.Date(nomedatabela$Data, "%d/%m/%Y")
ano <- lubridate::year(nomedatabela$Data)
Now the averages.
The tapply
function groups the first argument by a factor and calculates a value of the FUN
function for each group. The result is a vector.
tapply(nomedatabela$SP500, ano, FUN = mean, na.rm = TRUE)
# 2005 2006 2007 2008
#1213.207 1092.210 1174.513 1124.352
The aggregate
function also groups and calculates a value of the FUN
function for each group but the result is a data.frame
class object. To define the groups, I'll use the formulas interface of aggregate
.
aggregate(SP500 ~ ano, nomedatabela, FUN = mean, na.rm = TRUE)
# ano SP500
#1 2005 1213.207
#2 2006 1092.210
#3 2007 1174.513
#4 2008 1124.352
Data <- c("3/1/2005", "4/1/2005", "5/1/2005", "6/1/2005",
"14/2/2006", "15/2/2006", "16/2/2006", "17/2/2006",
"6/3/2007", "7/3/2007", "8/3/2007", "9/3/2007",
"13/2/2008", "14/2/2008", "15/2/2008", "16/2/2008")
SP500 <- runif(length(Data), 900, 1400)
library(dplyr)
t <- tibble(Data, SP500)
a<- as.Date(t$Data, "%d/%m/%Y")
ano <- tibble(ano=lubridate::year(a))
base <- cbind(t,ano)
base2 <- base %>% group_by(ano) %>% summarise(media=mean(SP500))
The only difference is between my solution and the solution from above and that I'm using the dplyr package to mediate and create tibbles.