Warning : update in magrittr 1.5
From magrittr 1.5 , the dot (.) of the %>%
works with nested calls. This way, it correctly replaces the point within row.names(.)
and now the example works normally without any modification.
dados <- mtcars %>% mutate(nomes=row.names(.))
head(dados)
mpg cyl disp hp drat wt qsec vs am gear carb nomes
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Datsun 710
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive
5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Hornet Sportabout
6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 Valiant
Response given before magrittr 1.5
Complementing Rogério's answer.
What is %>%
doing?
If you get the code for %>%
, roughly, it creates a new environment and plays the one on the left side in this environment. Then get the command that is on the right side, modify some things, and have the modified command executed within this new environment.
For example, if you run mtcars %>% mutate(., nomes = row.names(.))
, the left side is mtcars
and the right side is mutate(., nomes = row.names(.))
:
lhs <- substitute(mtcars)
rhs <- substitute(mutate(., nomes = row.names(.)))
Create a new environment and a name for the left side:
env <- new.env(parent = parent.frame())
nm <- paste(deparse(lhs), collapse = "")
Save the left side in the new environment with the name you created:
env[[nm]] <- eval(lhs, env)
#Para ver que o objeto foi criado:
head(env$mtcars)
Now you need to change the command points on the right side. The part that identifies where the points are is:
dots <- c(FALSE, vapply(rhs[-1], identical, quote(.),
FUN.VALUE = logical(1)))
But note that it runs only the first level of the call.
dots
nomes
FALSE TRUE FALSE
At the time of replacing, therefore, only the first point is replaced:
rhs[dots] <- rep(list(as.name(nm)), sum(dots))
e <- rhs
e
# veja que apenas o primeiro ponto foi substituído
mutate(mtcars, nomes = row.names(.))
So, when you run the function in the env environment, since there is no object named ".", the error will occur:
eval(e, env)
Erro em row.names(.) : objeto '.' não encontrado
The solution to this would be for the replacement part to occur at all levels of the call. For example, if we change the other point of e
manually:
e[[3]][[2]] <- as.name("mtcars")
Now it works:
eval(e, env)
# resultado omitido porque é grande
Why did it work with %.%
putting '_prev'
?
The function behind %.%
is chain_q
. To see the code, type dplyr:::chain_q
.
function (calls, env = parent.frame())
{
if (length(calls) == 0)
return()
if (length(calls) == 1)
return(eval(calls[[1]], env))
e <- new.env(parent = env)
e$'__prev' <- eval(calls[[1]], env)
for (call in calls[-1]) {
new_call <- as.call(c(call[[1]], quote('__prev'), as.list(call[-1])))
e$'__prev' <- eval(new_call, e)
}
e$'__prev'
}
Note that the function creates a new environment called e
and stores the first call of the command string with the name '_prev'
( e$'__prev' <- eval(calls[[1]], env)
so that you can access the result of the previous command in this way.
Hacking% >% (for illustration only)
If we set up a function that changes all points like this ( based on this SOen question ):
convert.call <- function(x, replacement) {
if (is.call(x)) as.call(lapply(x, convert.call, replacement=replacement)) else
if (identical(x, quote(.))) as.name(replacement) else
x
}
# testando
expr <- substitute(mean(exp(sqrt(.)), .))
convert.call(expr, "x")
# mean(exp(sqrt(x)), x)
Here we can hack the definition of %>%
to make all points changed:
'%>%' <- function (lhs, rhs)
{
convert.call <- function(x, replacement) {
if (is.call(x)) as.call(lapply(x, convert.call, replacement=replacement)) else
if (identical(x, quote(.))) as.name(replacement) else
x
}
lhs <- substitute(lhs)
rhs <- substitute(rhs)
if (is.call(rhs) && identical(rhs[[1]], quote('(')))
rhs <- eval(rhs, parent.frame(), parent.frame())
if (!any(is.symbol(rhs), is.call(rhs), is.function(rhs)))
stop("RHS should be a symbol, a call, or a function.")
env <- new.env(parent = parent.frame())
nm <- paste(deparse(lhs), collapse = "")
nm <- if (nchar(nm) < 9900 && (is.call(lhs) || is.name(lhs)))
nm
else "__LHS"
env[[nm]] <- eval(lhs, env)
if (is.function(rhs)) {
res <- withVisible(rhs(env[[nm]]))
}
else if (is.call(rhs) && deparse(rhs[[1]]) == "function") {
res <- withVisible(eval(rhs, parent.frame(), parent.frame())(eval(lhs,
parent.frame(), parent.frame())))
}
else {
if (is.symbol(rhs)) {
if (!exists(deparse(rhs), parent.frame(), mode = "function"))
stop("RHS appears to be a function name, but it cannot be found.")
e <- call(as.character(rhs), as.name(nm))
}
else {
e <- convert.call(rhs, nm)
}
res <- withVisible(eval(e, env))
}
if (res$visible)
res$value
else invisible(res$value)
}
See now mtcars %>% mutate(., nomes = row.names(.))
works. But I put it here just to explain what's going on, I would not recommend using the hacked version of %>%
because it might cause bugs on other occasions --- for example, the way you are explicitly going to have to put the dots every hour, as in mtcars %>% filter(., cyl==4) %>% mutate(., nomes = row.names(.))
.
dplyr does not necessarily keep row.names in operations
One last note: dplyr (nor data.table) does not keep row.names intact during operations. Note that dplyr replaces row.names in filter
and data.table overrides when you convert the data.frame:
mt_dplyr <- filter(mtcars, cyl==4)
row.names(mt_dplyr)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11"
mt_dt <- data.table(mtcars)
row.names(mt_dt)
1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22"
[23] "23" "24" "25" "26" "27" "28" "29" "30" "31" "32"
So, in the end, if row.names contains relevant information, it seems to be safer to turn it into a column before you can further manipulate the data.
An alternative "solution": creating your own mutate function that has a row_names
local
A solution that can be done is as follows: you create your own mutate that stores a row_names
vector inside your parent environment (which in context will be the %>%
environment, but if you use the function alone will be the global environment, so be careful) and then run the dplyr mutate in this environment. So, if you want to use the line names, just use the row_names
object. Let's call our mutate
of mutate2
:
mutate2 <- function(x, ...){
assign("row_names", row.names(x), parent.frame())
eval(substitute(mutate(x, ...)), parent.frame())
}
mtcars %>% mutate2(z = cyl^2, nomes=row_names) %>% filter(z==36)
mpg cyl disp hp drat wt qsec vs am gear carb z nomes
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 36 Mazda RX4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 36 Mazda RX4 Wag
3 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 36 Hornet 4 Drive
4 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 36 Valiant
5 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 36 Merc 280
6 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 36 Merc 280C
7 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 36 Ferrari Dino