Select multiple rows from a data.frame, from the largest values R

Question

Select multiple rows from a data.frame, from the largest values R

Navigation

#1 by (4 votes)
#2 by (2 votes)

4

I have the following date.frame in R:

df <- data.frame(x = c(10,10,2,3,4,8,8,8),
                 y = c(5,4,6,7,8,3,2,4))
df
   x y
1 10 5
2 10 4
3  2 6
4  3 7
5  4 8
6  8 3
7  8 2
8  8 4

First point : I would like to get all rows containing the top 5 values of the x column,

Example:

The top five of the x column are: 10, 10, 8, 8, 8.

I can get with the following code:

rev(sort(df$x))[1:5]
[1] 10 10  8  8  8

But I'd like to get the entire row, not just the values from the x column. So the result I want is:

And not:

> [1] 10 10  8  8  8

data r dplyr

asked by anonymous 13.06.2017 / 18:56

2 answers

2

To complement, how to do in data.table :

library(data.table)
setDT(df)
df[order(x, decreasing = T),][1:5,]
    x y
1: 10 5
2: 10 4
3:  8 3
4:  8 2
5:  8 4

To remove duplicates in the x column, sort by x and pick the top 5:

df[!duplicated(x),][order(x, decreasing = T), ][1:5, ]

13.06.2017 / 19:53

How does a virtual keyboard work for security purposes? Find lower string of a list in python

score 4 · Accepted Answer

Using the dplyr package:

library(dplyr)
df %>%
  top_n(x, n=5)
   x y
1 10 5
2 10 4
3  8 3
4  8 2
5  8 4

Using order , one of the default functions of R :

df[order(df$x, decreasing=TRUE), ][1:5, ]
   x y
1 10 5
2 10 4
6  8 3
7  8 2
8  8 4

Notice that the solution with dplyr creates an output that is unrelated to the old data frame, whereas the solution with order tells you which lines of the original date frame were kept in this current selection.