Sort the highest k results using dplyr

5

I can select the largest k results of a table in R. For example, if k equals 5, I get the following result:

library(dplyr)
library(ggplot2)

top_n(mpg, 5, wt=displ)
# A tibble: 5 × 11
  manufacturer              model displ  year   cyl      trans   drv   cty
         <chr>              <chr> <dbl> <int> <int>      <chr> <chr> <int>
1    chevrolet           corvette   6.2  2008     8 manual(m6)     r    16
2    chevrolet           corvette   6.2  2008     8   auto(s6)     r    15
3    chevrolet           corvette   7.0  2008     8 manual(m6)     r    15
4    chevrolet    k1500 tahoe 4wd   6.5  1999     8   auto(l4)     4    14
5         jeep grand cherokee 4wd   6.1  2008     8   auto(l5)     4    11
# ... with 3 more variables: hwy <int>, fl <chr>, class <chr>

However, my results are not sorted according to displ column. I would like the table rows to be in descending order, as follows:

top_n(mpg, 5, wt=displ)[order(top_n(mpg, 5, wt=displ)$displ, decreasing=TRUE), ]
# A tibble: 5 × 11
  manufacturer              model displ  year   cyl      trans   drv   cty
         <chr>              <chr> <dbl> <int> <int>      <chr> <chr> <int>
1    chevrolet           corvette   7.0  2008     8 manual(m6)     r    15
2    chevrolet    k1500 tahoe 4wd   6.5  1999     8   auto(l4)     4    14
3    chevrolet           corvette   6.2  2008     8 manual(m6)     r    16
4    chevrolet           corvette   6.2  2008     8   auto(s6)     r    15
5         jeep grand cherokee 4wd   6.1  2008     8   auto(l5)     4    11
# ... with 3 more variables: hwy <int>, fl <chr>, class <chr>

The code works, but I find it ugly. In what way could I simplify it to get the same result? Note that I use the top_n(mpg, 5, wt=displ) command twice, which I suspect might make my code slower if the table is too large. Is there any way to get this same result in a more elegant way?

    
asked by anonymous 08.12.2016 / 22:23

2 answers

3

dplyr makes use of chaining and pipe operator ( %>% ) to improve code reading and make it more succinct. In addition, you also have the arrange() function to sort the results.

res1 <- top_n(mpg, 5, wt=displ)[order(top_n(mpg, 5, wt=displ)$displ, decreasing=TRUE), ]

res2 <- mpg %>% top_n(5, displ) %>% arrange(desc(displ))

identical(res1, res2)
[1] TRUE

The documentation on pipe operator in stackoverflow is excellent.

    
08.12.2016 / 22:59
2

Other ways to do the same thing:

library(dplyr)

mpg %>% arrange(desc(displ)) %>% slice(1:5)
mpg %>% filter(row_number(desc(displ)) <= 5)
    
09.12.2016 / 11:02