How does the 'dplyr :: n ()' function know that it is not being called from the global environment?


When calling the dplyr::n() function in the global environment, an error occurs.

# Error: This function should not be called directly

This error makes sense and I was curious to see how it was implemented.

# function () 
# {
#     abort("This function should not be called directly")
# }
# <bytecode: 0x000000001650f200>
# <environment: namespace:dplyr>

To my surprise, however, there is no if or condition check. Just throw the bug. The same does not occur when we call n() in your natural habitat .

mtcars %>% 
  group_by(cyl) %>% 
  summarise(n = n())

# # A tibble: 3 x 2
#     cyl     n
#   <dbl> <int>
# 1     4    11
# 2     6     7
# 3     8    14

So the questions that remain are two:

  • How does the n() function know that it is being called in another context? and
  • How does n() count? (where is the source code for that part)
  • asked by anonymous 12.12.2018 / 20:00

    1 answer


    The n function only works within dplyr and is part of an internal part of the package that is called Hybrid Evaluation . The full description is here .

    Hybrid Evaluation is one of the main features that makes dplyr fast for some tasks.

    At first, when you make a summarise , for example summarise(n = n()) , dplyr would need to execute this function for each piece of base. This could be costly if the base has many groups, for example. This is why dplyr recognizes some expressions such as n() , sum(variavel) and handles them directly using a C ++ code.

    In the case of the n() function the input port for your definition is in this file: link

    So, in fact the function of n() does not know that it is being called in another context, in fact, it is dplyr that changes its meaning when the function is used within mutate or summarise .

    14.12.2018 / 20:11