In R, How to calculate the average of one column based on criterion in another column?

Question

In R, How to calculate the average of one column based on criterion in another column?

Navigation

#1 by (5 votes)
#2 by (0 votes)

2

I have two columns (A and B) I want to calculate the average of column A for the corresponding elements only for those in column B are greater than 10 for example.

r static

asked by anonymous 11.02.2018 / 19:34

2 answers

5

It is a problem of selecting rows from a data frame by a logical condition:

set.seed(6480)    # Para ter resultados reprodutíveis

n <- 50
dados <- data.frame(A = runif(n, 0, 100), B = runif(n, 0, 40))

mean(dados[dados$B > 10, "A"])    # índice lógico
#[1] 51.62713

mean(dados$A[dados$B > 10])       # equivalente
#[1] 51.62713

But if the column B has values NA the logical index does not work, we have to use which .

is.na(dados$B) <- sample(n, 10)        # fazer alguns B iguais a NA

mean(dados$A[dados$B > 10])            # veja o que dados$B > 10 dá
#[1] NA

mean(dados$A[which(dados$B > 10)])
#[1] 52.17357

EDITION.

As Flávio Silva says in comment, you can also use the argument na.rm .

mean(dados$A[dados$B > 10], na.rm = TRUE)
#[1] 52.17357

11.02.2018 / 19:53

Why does parseInt return NaN? Create zip file of a folder

score 0 · Accepted Answer

When it's data.frame, and I have to take multi-column averages, I use the colMeans code and inside the function it places type: colMeans(dados[dados$coluna1=="A" & dados$coluna2=="0.01",]) it will usually get the averages by columns of data that meet this criterion I only remember that in the , part of this comma you are not discriminating which columns go into the calculation so it will be done with all, the problem is when the columns have characters "A", "NAME") because the code usually does not perform the calculation of averages with factors, for this you can simply concatenate the respective numbers of the columns that you want the results. Ex: nova.tabela.contendo.as.médias<-colMeans(dados[dados$coluna1=="A" & dados$coluna2=="0.01",c(4,1,3,2,5,7)]) ,