Error with the function lapply

3

I try to execute the following function:

result<-lapply(mylist,function(x)cbind(x,var=tapply(x[,c(3)],x[,c(1)],sum)))

But, this error appears:

Error in data.frame(..., check.names = FALSE) : 
arguments imply differing number of rows: 340623, 63073

I need to return the sum within the dataframes for my list. What is the problem with the formula?

Unfortunately, I can not share the data to help answer

    
asked by anonymous 08.10.2018 / 21:38

1 answer

2

Without the data it becomes very complicated to replicate the problem you are encountering. Using the structure you have already posted on other questions

mylist
[[1]]
     number group      sexo
1  26.12186     a Masculino
2  40.39104     a Masculino
3  29.29426     a Masculino
4  45.11651     b  Feminino
5  26.72512     b Masculino
6  45.95550     b Masculino
7  47.56538     c  Feminino
8  43.14062     c  Feminino
9  47.42608     c Masculino
10 23.57519     c  Feminino

[[2]]
     number group      sexo
1  47.64770     a Masculino
2  22.61412     a  Feminino
3  48.37883     a Masculino
4  48.44754     b Masculino
5  41.67047     b  Feminino
6  23.74823     b Masculino
7  28.82786     c Masculino
8  30.12309     c  Feminino
9  27.12305     c Masculino
10 49.58259     c  Feminino
11 40.21284     d Masculino
12 40.57279     d  Feminino
13 48.33335     d Masculino
14 22.92160     d Masculino
15 25.07216     e Masculino

I've been doing what you want step-by-step. When running the tapply function you will get a array with two columns and a row, which will be the sum of the values by sex:

tapply(mylist[[1]][,1], mylist[[1]][,3], sum)
 Feminino Masculino 
 159.3977  215.9139

This is why the error is appearing, when executing the command cbind it tries to concatenate a data.frame with a number of lines different from the result of tapply .

To work around this problem and understanding that what you want is to put the value of the sum of the sexes as a new variable, you can base it on the following code:

teste <- mylist[[1]]
teste1 <- tapply(teste[,1], teste[,3], sum)
teste2 <- tidyr::gather(data.frame(teste1), key = "sexo")
teste2$sexo <- names(teste1)

dplyr::left_join(teste, teste2)
Joining, by = "sexo"
 number group      sexo    value
1  26.12186     a Masculino 215.9139
2  40.39104     a Masculino 215.9139
3  29.29426     a Masculino 215.9139
4  45.11651     b  Feminino 159.3977
5  26.72512     b Masculino 215.9139
6  45.95550     b Masculino 215.9139
7  47.56538     c  Feminino 159.3977
8  43.14062     c  Feminino 159.3977
9  47.42608     c Masculino 215.9139
10 23.57519     c  Feminino 159.3977

I've only routed the first data.frame of the list just to try to understand the problem.

    
08.10.2018 / 22:42