How to parallelize a sapply with table

2

I can perform sapply without problems, but I can not parallelize. In the original script I have more than 9,000,000 lines and so it is unfeasible to continue without parallelization.

dfteste<-data.frame(c(1,1,1),c(1,1,1),c(1,1,1))
apteste<-sapply(1:3,function (x) {paste(dfteste[x,], collapse="-")})

library(parallel)
cl<-makeCluster(4)
apteste<-parSapply(cl,1:3,function (x) {paste(dfteste[x,], collapse="-")}) #nao funciona
stopCluster()

Thank you.

    
asked by anonymous 25.04.2018 / 16:43

1 answer

4

The problem is that the dfteste object is present in only two two environments created by makeCluster() . That is, you create the object in the current environment, then create 3 other environments in which dfteste is non-existent.

Possible solution : You can export the dfteste object to the created environments using the clusterExport() function:

library(parallel)

cl <- makeCluster(4)
dfteste <- data.frame(c(1, 1, 1), c(1, 1, 1), c(1, 1, 1))
sapply(1:3, function (x) {paste(dfteste[x, ], collapse = "-")})
# [1] "1-1-1" "1-1-1" "1-1-1"

clusterExport(cl, "dfteste")
parSapply(cl, 1:3, function (x) {paste(dfteste[x,], collapse = "-")}) # funciona
#[1] "1-1-1" "1-1-1" "1-1-1"

stopCluster(cl)
    
26.04.2018 / 09:54