I find the following way more concise to do what you need:
library(purrr) # para a função map
library(tidyr) # para a função unnest
library(dplyr) # para a função as_data_frame
map(lista, ~map(.x, ~.x[1:10])) %>%
as_data_frame() %>%
unnest()
The result is this:
# A tibble: 30 × 2
num chr
<int> <chr>
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
7 7 g
8 8 h
9 9 i
10 10 j
# ... with 20 more rows
Another way, which also gets legal is:
lista %>%
as_data_frame() %>%
mutate(chr = map(chr, ~.x[1:10])) %>%
unnest()
list columns
, that is, columns of date.frames that are lists are being widely used and popularized by Hadley Wickham. See here for the R for Data Science .
In the example with list columns
I just modified the chr column, but you could modify all the columns using:
lista %>%
as_data_frame() %>%
mutate_all(funs(map(., ~.x[1:10]))) %>%
unnest()
Complementing Tom's Benchmark
> lista <- list(
+ num = lapply(1:10, function(x) sample(1:100, 20)),
+ chr = lapply(1:10, function(x) sample(letters, 20))
+ )
> microbenchmark(
+ solucao_tomas = {as.data.frame(sapply(lapply(lista, pegar_elem, 1:10), unlist))},
+ solucao_daniel = {unnest(as_data_frame(map(lista, ~map(.x, ~.x[1:10]))))}
+ )
Unit: microseconds
expr min lq mean median uq max neval
solucao_tomas 419.026 439.375 466.7568 454.947 476.889 695.780 100
solucao_daniel 2456.108 2559.625 2745.8009 2680.130 2836.733 4466.647 100
> lista <- list(
+ num = lapply(1:1000, function(x) sample(1:100, 20)),
+ chr = lapply(1:1000, function(x) sample(letters, 20))
+ )
> microbenchmark(
+ solucao_tomas = {as.data.frame(sapply(lapply(lista, pegar_elem, 1:10), unlist))},
+ solucao_daniel = {unnest(as_data_frame(map(lista, ~map(.x, ~.x[1:10]))))}
+ )
Unit: milliseconds
expr min lq mean median uq max neval
solucao_tomas 13.559905 14.15854 14.64829 14.56517 14.83060 16.89264 100
solucao_daniel 9.871144 10.27053 11.07952 10.80652 11.29402 19.82793 100
> lista <- list(
+ num = lapply(1:10000, function(x) sample(1:100, 20)),
+ chr = lapply(1:10000, function(x) sample(letters, 20))
+ )
> microbenchmark(
+ solucao_tomas = {as.data.frame(sapply(lapply(lista, pegar_elem, 1:10), unlist))},
+ solucao_daniel = {unnest(as_data_frame(map(lista, ~map(.x, ~.x[1:10]))))}
+ )
Unit: milliseconds
expr min lq mean median uq max neval
solucao_tomas 156.63202 171.06855 195.3683 180.86325 227.1462 271.7314 100
solucao_daniel 80.93934 91.22597 100.5079 96.73947 104.7544 154.6254 100
That is, when the list is small Tomás' solution using for
is more efficient, however the difference there is in the house of microseconds. (efficiency is not very important when objects are small). When objects begin to grow, the solution using purrr
, dplyr
and tidyr
becomes more efficient. With lists of size 10,000 it becomes 2x faster. This solution is efficient when it is necessary, that is, when the size of the objects grows.