Add rownames as column using dplyr

10

I would like to do something that is fairly simple using the common syntax of R, but using the dplyr package.

The task is basically to add the row.names of a data.frame object as a column to that same object. Using mtcars as an example, this could be done like this:

dados <- mtcars
dados$nomes <- row.names(mtcars)

I'd like to do something like this

dados <- mtcars %>% mutate(nomes=row.names(.))

But this code gives the error Error: unsupported type for column 'nomes' (NILSXP) (of course, because I'm doing something wrong).

I wonder if there is a way to solve this "problem".

    
asked by anonymous 01.10.2014 / 20:11

4 answers

14

Warning : update in magrittr 1.5

From magrittr 1.5 , the dot (.) of the %>% works with nested calls. This way, it correctly replaces the point within row.names(.) and now the example works normally without any modification.

dados <- mtcars %>% mutate(nomes=row.names(.))
head(dados)
   mpg cyl disp  hp drat    wt  qsec vs am gear carb             nomes
1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4         Mazda RX4
2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4     Mazda RX4 Wag
3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1        Datsun 710
4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1    Hornet 4 Drive
5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 Hornet Sportabout
6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1           Valiant

Response given before magrittr 1.5

Complementing Rogério's answer.

What is %>% doing?

If you get the code for %>% , roughly, it creates a new environment and plays the one on the left side in this environment. Then get the command that is on the right side, modify some things, and have the modified command executed within this new environment.

For example, if you run mtcars %>% mutate(., nomes = row.names(.)) , the left side is mtcars and the right side is mutate(., nomes = row.names(.)) :

lhs <- substitute(mtcars)
rhs <- substitute(mutate(., nomes = row.names(.)))

Create a new environment and a name for the left side:

env <- new.env(parent = parent.frame())
nm <- paste(deparse(lhs), collapse = "")

Save the left side in the new environment with the name you created:

env[[nm]] <- eval(lhs, env)

#Para ver que o objeto foi criado:
head(env$mtcars)

Now you need to change the command points on the right side. The part that identifies where the points are is:

dots <- c(FALSE, vapply(rhs[-1], identical, quote(.), 
                              FUN.VALUE = logical(1)))

But note that it runs only the first level of the call.

dots
            nomes 
FALSE  TRUE FALSE 

At the time of replacing, therefore, only the first point is replaced:

 rhs[dots] <- rep(list(as.name(nm)), sum(dots))
 e <- rhs
 e
 # veja que apenas o primeiro ponto foi substituído
 mutate(mtcars, nomes = row.names(.))

So, when you run the function in the env environment, since there is no object named ".", the error will occur:

eval(e, env)
Erro em row.names(.) : objeto '.' não encontrado

The solution to this would be for the replacement part to occur at all levels of the call. For example, if we change the other point of e manually:

e[[3]][[2]] <- as.name("mtcars")

Now it works:

eval(e, env)
# resultado omitido porque é grande

Why did it work with %.% putting '_prev' ?

The function behind %.% is chain_q . To see the code, type dplyr:::chain_q .

function (calls, env = parent.frame()) 
{
    if (length(calls) == 0) 
        return()
    if (length(calls) == 1) 
        return(eval(calls[[1]], env))
    e <- new.env(parent = env)
    e$'__prev' <- eval(calls[[1]], env)
    for (call in calls[-1]) {
        new_call <- as.call(c(call[[1]], quote('__prev'), as.list(call[-1])))
        e$'__prev' <- eval(new_call, e)
    }
    e$'__prev'
}

Note that the function creates a new environment called e and stores the first call of the command string with the name '_prev' ( e$'__prev' <- eval(calls[[1]], env) so that you can access the result of the previous command in this way.

Hacking% >% (for illustration only)

If we set up a function that changes all points like this ( based on this SOen question ):

convert.call <- function(x, replacement) {
  if (is.call(x)) as.call(lapply(x, convert.call, replacement=replacement)) else
    if (identical(x, quote(.))) as.name(replacement) else
      x
}
# testando
expr <- substitute(mean(exp(sqrt(.)), .))
convert.call(expr, "x")
# mean(exp(sqrt(x)), x)

Here we can hack the definition of %>% to make all points changed:

'%>%' <- function (lhs, rhs) 
{
  convert.call <- function(x, replacement) {
    if (is.call(x)) as.call(lapply(x, convert.call, replacement=replacement)) else
      if (identical(x, quote(.))) as.name(replacement) else
        x
  }

  lhs <- substitute(lhs)
  rhs <- substitute(rhs)
  if (is.call(rhs) && identical(rhs[[1]], quote('('))) 
    rhs <- eval(rhs, parent.frame(), parent.frame())
  if (!any(is.symbol(rhs), is.call(rhs), is.function(rhs))) 
    stop("RHS should be a symbol, a call, or a function.")
  env <- new.env(parent = parent.frame())
  nm <- paste(deparse(lhs), collapse = "")
  nm <- if (nchar(nm) < 9900 && (is.call(lhs) || is.name(lhs))) 
    nm
  else "__LHS"
  env[[nm]] <- eval(lhs, env)
  if (is.function(rhs)) {
    res <- withVisible(rhs(env[[nm]]))
  }
  else if (is.call(rhs) && deparse(rhs[[1]]) == "function") {
    res <- withVisible(eval(rhs, parent.frame(), parent.frame())(eval(lhs, 
                                                                      parent.frame(), parent.frame())))
  }
  else {
    if (is.symbol(rhs)) {
      if (!exists(deparse(rhs), parent.frame(), mode = "function")) 
        stop("RHS appears to be a function name, but it cannot be found.")
      e <- call(as.character(rhs), as.name(nm))
    }
    else {
      e <- convert.call(rhs, nm)
    }
    res <- withVisible(eval(e, env))
  }
  if (res$visible) 
    res$value
  else invisible(res$value)
}

See now mtcars %>% mutate(., nomes = row.names(.)) works. But I put it here just to explain what's going on, I would not recommend using the hacked version of %>% because it might cause bugs on other occasions --- for example, the way you are explicitly going to have to put the dots every hour, as in mtcars %>% filter(., cyl==4) %>% mutate(., nomes = row.names(.)) .

dplyr does not necessarily keep row.names in operations

One last note: dplyr (nor data.table) does not keep row.names intact during operations. Note that dplyr replaces row.names in filter and data.table overrides when you convert the data.frame:

mt_dplyr <- filter(mtcars, cyl==4)
row.names(mt_dplyr)
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11"

mt_dt <- data.table(mtcars)
row.names(mt_dt)
1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22"
[23] "23" "24" "25" "26" "27" "28" "29" "30" "31" "32"

So, in the end, if row.names contains relevant information, it seems to be safer to turn it into a column before you can further manipulate the data.

An alternative "solution": creating your own mutate function that has a row_names local

A solution that can be done is as follows: you create your own mutate that stores a row_names vector inside your parent environment (which in context will be the %>% environment, but if you use the function alone will be the global environment, so be careful) and then run the dplyr mutate in this environment. So, if you want to use the line names, just use the row_names object. Let's call our mutate of mutate2 :

mutate2 <- function(x, ...){
  assign("row_names", row.names(x), parent.frame())
  eval(substitute(mutate(x, ...)), parent.frame())
}

mtcars %>% mutate2(z = cyl^2, nomes=row_names) %>% filter(z==36)

   mpg cyl  disp  hp drat    wt  qsec vs am gear carb  z          nomes
1 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 36      Mazda RX4
2 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 36  Mazda RX4 Wag
3 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 36 Hornet 4 Drive
4 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1 36        Valiant
5 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4 36       Merc 280
6 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4 36      Merc 280C
7 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6 36   Ferrari Dino
    
02.10.2014 / 05:26
6

July,

I could not think of a solution using dplyr , but a simple solution and maybe letting the code be cleaner is, create a row_names function as follows:

row_names <- function(x, var){
  var <- deparse(substitute(var))
  x[var] <- row.names(x)
  return(x)
}

Then you can use it like this:

mtcars %>% row_names(nomes) %>% filter(cyl == 6)

Maybe so the efficiency of dplyr is lost, but it gets cute ..

Edit:

You can write a function that does something similar to Rogério's first solution, using dplyr

row_names_d <- function(x, var){
  var <- deparse(substitute(var))
  x <- mutate(x, rn = row.names(x))
  names(x)[length(names(x))] <- var
  return(x)
}

mtcars %>% row_names_d(nome)

But I did the benchmark and it does not seem worth it ...

> library(microbenchmark)
> microbenchmark(
+   
+   mtcars %>% row_names(nome),
+   mtcars %>% row_names_d(nome)
+   
+   
+   )
# Unit: microseconds
#                         expr     min       lq   median      uq     max neval
#   mtcars %>% row_names(nome) 183.334 194.0015 202.4965 210.399 326.760   100
# mtcars %>% row_names_d(nome) 244.972 259.5905 268.4810 279.149 551.581   100
    
02.10.2014 / 13:35
4

Try this here:

    library(dplyr)
    dados <- mtcars
    dados %>% mutate(names = row.names(dados))

You can also do:

    dados %.% mutate(names = row.names('__prev'))

'__ prev' (between quotes) indicates the previous element (previous) of the string. And this argument only works when you replace% >% with%.%

See also this post on SO-en: link

    
01.10.2014 / 20:53
3

There is a function called rownames_to_column of package tibble that allows you to do this:

mtcars %>% rownames_to_column()
    
17.02.2017 / 03:30