Suppose I have the following database
set.seed(100)
base <- expand.grid(grupo = c("a", "b", "c", "d"), score = runif(100))
And I want to select the lines with the lowest score
depending on the group according to the table below:
qtds <- data.frame(grupo = levels(base$grupo), qtd = c(1, 2, 3, 4))
qtds
grupo qtd
1 a 1
2 b 2
3 c 3
4 d 4
That is, I want to select the line with the lowest score
of the group a
, the two lines with the lowest score
of the group b
, and so on ...
At the moment, I'm doing this:
novaBase <- data.frame()
for(i in levels(base$grupo)){
novaBase <- rbind(novaBase,
base %>%
filter(grupo == i) %>%
filter(row_number(score) <= qtds$qtd[qtds$grupo == i])
)
}
grupo score
1 a 0.0003950703
2 b 0.0003950703
3 b 0.0039051792
4 c 0.0003950703
5 c 0.0221628349
6 c 0.0039051792
7 d 0.0269371939
8 d 0.0003950703
9 d 0.0221628349
10 d 0.0039051792
This works, but it seems very inefficient, and the code is hard to understand. Does anyone know a better way?