Select multiple rows from a data.frame, from the largest values R

4

I have the following date.frame in R:

df <- data.frame(x = c(10,10,2,3,4,8,8,8),
                 y = c(5,4,6,7,8,3,2,4))
df
   x y
1 10 5
2 10 4
3  2 6
4  3 7
5  4 8
6  8 3
7  8 2
8  8 4

First point : I would like to get all rows containing the top 5 values of the x column,

Example:

The top five of the x column are: 10, 10, 8, 8, 8.

I can get with the following code:

rev(sort(df$x))[1:5]
[1] 10 10  8  8  8

But I'd like to get the entire row, not just the values from the x column. So the result I want is:

1 10 5
2 10 4
6  8 3
7  8 2
8  8 4

And not:

> [1] 10 10  8  8  8
    
asked by anonymous 13.06.2017 / 18:56

2 answers

4

Using the dplyr package:

library(dplyr)
df %>%
  top_n(x, n=5)
   x y
1 10 5
2 10 4
3  8 3
4  8 2
5  8 4

Using order , one of the default functions of R :

df[order(df$x, decreasing=TRUE), ][1:5, ]
   x y
1 10 5
2 10 4
6  8 3
7  8 2
8  8 4

Notice that the solution with dplyr creates an output that is unrelated to the old data frame, whereas the solution with order tells you which lines of the original date frame were kept in this current selection.

    
13.06.2017 / 19:09
2

To complement, how to do in data.table :

library(data.table)
setDT(df)
df[order(x, decreasing = T),][1:5,]
    x y
1: 10 5
2: 10 4
3:  8 3
4:  8 2
5:  8 4

To remove duplicates in the x column, sort by x and pick the top 5:

df[!duplicated(x),][order(x, decreasing = T), ][1:5, ]
    
13.06.2017 / 19:53