execute .GlobalEnv function in parallel processing

3

I need to run a function that is in .GlobalEnv in a parallel processing using the multidplyr package.

Using a simple example and no parallel processing works as expected:

library(dplyr)
library(purrr)
library(multidplyr)

data.frame(x = 1:10) %>%
  mutate(y = purrr::map(x, add_a))

But when I try to put the parallelism, it does not recognize the function "add_a"

add_a <- function(x) {
  paste0(x, "A")
}

data.frame(x = 1:10) %>%
  partition() %>% 
  mutate(
    y = purrr::map(x, add_a)
  )

Returning the following message:

Error in checkForRemoteErrors(lapply(cl, recvResult)) : 
  10 nodes produced errors; first error: objeto 'add_a' não encontrado 
    
asked by anonymous 21.02.2017 / 19:24

1 answer

2

You have to export the add_a object to each node in the cluster.

One way to do this is to create the cluster manually and add the role to each node.

For example:

library(dplyr)
library(purrr)
library(multidplyr)

add_a <- function(x) {
  paste0(x, "A")
}

cluster <- create_cluster() # cria o cluster
cluster_assign_value(cluster, "add_a", add_a) # adiciona a função add_a a cada nó

data.frame(x = 1:10) %>%
  partition(cluster = cluster) %>% # fala qual cluster você vai usar
  mutate(
    y = purrr::map(x, add_a))

Source: party_df [10 x 3]
Groups: PARTITION_ID
Shards: 7 [1--2 rows]

# S3: party_df
       x PARTITION_ID         y
   <int>        <dbl>    <list>
1      1            1 <chr [1]>
2      6            1 <chr [1]>
3      9            2 <chr [1]>
4      5            3 <chr [1]>
5      7            3 <chr [1]>
6      2            4 <chr [1]>
7      3            5 <chr [1]>
8     10            5 <chr [1]>
9      8            6 <chr [1]>
10     4            7 <chr [1]>
    
25.02.2017 / 16:56