r - average of one variable in relation to the values of another variable in a data frame and withdraw values NA

1

I have a multi-column dataframe. How do I calculate the average of one of the variables based on the values of another variable? I have the frequency of several species found in 4 campaigns and I want to calculate the average of each recorded species. For this I must add the frequencies observed by the number of campaigns performed in each place, but the function I used

dadomean = dcast(dados, local  ~ especie, mean)

calculates the average based on only the campaigns the species was registered and does not use the data where the record was 0. as well as the

dadomean = dados %>%
  group_by(local, especie) %>%
  summarise(mean(frequencia))

I've also tried

dadomean = dcast(dados, local  ~ especie, mean, subset = .(campanha == 4)))

but did not accept the function and gave this error

  

Error in. (campaign == 4): could not find function "."

I also tried the following and it did not work.

dadomean = dcast(dados, local  ~ especie, mean, na.rm=TRUE, margins = "campanha")

And also always has NA for those places where it was meant to be 0 and could not convert to 0 .

campanha	local	especie	frequencia
1	         A	    aa	      1
1	         A	    bb	      2
1	         A	    cc	      1
1	         B	    bb	      1
1	         B	    dd	      7
2	         A	    aa	      50
2	         A	    bb	      1
2	         A	    dd	      8
3          A	    aa	      2
3	         B	    aa	      3
3	         B	    dd	      3
4	         A	    aa	      33
4	         A	    bb	      5
4	         A	    cc	      1
4	         A	    dd	      1
4	         B	    aa	      18
4	         B	    bb	      10
4	         B	    dd	      6
    
asked by anonymous 23.07.2018 / 22:33

2 answers

1

The question is rather confusing. Question by averages of frequencia grouped by campanha and then only gives code examples where grouping is by local and especie .

I'll first group by campanha .

aggregate(frequencia ~ campanha, dados, mean, na.rm = TRUE)
#  campanha frequencia
#1        1   2.400000
#2        2  19.666667
#3        3   2.666667
#4        4  10.571429

Now, I'm going to group by local and espécie , both using the reshape2 package and the base function tapply . As you can see the results are identical, the only difference is that one assigns the value NaN when the mean can not be calculated and the other assigns NA . Also, to put 0 is exactly the same way.

library(reshape2)

dadomean1 <- dcast(dados, local  ~ especie, mean, value.var = "frequencia")
dadomean1[is.na(dadomean1)] <- 0
dadomean1
#  local   aa       bb  cc       dd
#1     A 21.5 2.666667   1 4.500000
#2     B 10.5 5.500000   0 5.333333


dadomean2 <- with(dados, tapply(frequencia, list(local, especie), mean))
dadomean2[is.na(dadomean2)] <- 0
dadomean2
#    aa       bb cc       dd
#A 21.5 2.666667  1 4.500000
#B 10.5 5.500000  0 5.333333

EDITION.

To calculate campaign averages grouped by especie and local but taking into account all campaigns and not just those in which the species is registered, it is best to define a mediaCamp function that does these calculations. Home Then,% w / w% is used again.

mediaCamp <- function(x){
  ncamp <- length(unique(dados$campanha))
  sum(x)/ncamp
}

dadomean3 <- aggregate(frequencia ~ especie + local, dados, mediaCamp)
dadomean3
#  especie local frequencia
#1      aa     A      21.50
#2      bb     A       2.00
#3      cc     A       0.50
#4      dd     A       2.25
#5      aa     B       5.25
#6      bb     B       2.75
#7      dd     B       4.00

DATA in aggregate format.

dados <-
structure(list(campanha = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), local = structure(c(1L, 
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L), .Label = c("A", "B"), class = "factor"), especie = structure(c(1L, 
2L, 3L, 2L, 4L, 1L, 2L, 4L, 1L, 1L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 
4L), .Label = c("aa", "bb", "cc", "dd"), class = "factor"), frequencia = c(1L, 
2L, 1L, 1L, 7L, 50L, 1L, 8L, 2L, 3L, 3L, 33L, 5L, 1L, 1L, 18L, 
10L, 6L)), .Names = c("campanha", "local", "especie", "frequencia"
), class = "data.frame", row.names = c(NA, -18L))
    
24.07.2018 / 18:51
1

I do not know if this is exactly what you want.

The average of each species in each location.

library(dplyr)
group_by(dados, especie, local)%>%summarise(Total=mean(frequencia))
    
23.07.2018 / 23:20