Operation with lists of very large sizes

5

I have a code that calculates the area of the intersection between two polygons and for this I use lists to save the coordinates of the vertices of the polygons, however they are many polygons and it takes an average of 6h to run the whole code. Do you know any list operations that can help minimize the procedure?

My code

``````require(dplyr); require(rgeos); require(sp)
sim.polygons = function(objects, vertex){
polygons = NULL
for(i in 1:objects) polygons[[i]] = matrix(runif(vertex*2), ncol = 2)
return(polygons)
}

teste = function(lista1, lista2, progress = F){
lista1 = lapply(lista1, as, Class = "gpc.poly")
lista2 = lapply(lista2,  as, Class = "gpc.poly")
res = matrix(0, nrow = length(lista2), ncol = length(lista1))
for(k in 1 : length(lista1)){
for(l in 1 : length(lista2)){
res[l, k] = area.poly(intersect(lista1[[k]], lista2[[l]])) #Gargalo do código
}
if(progress == T) print(k)
}
res
}
#exemplo
a = sim.polygons(50, 3) #no meu problema objects = 144 e vertex = 3
b = sim.polygons(100, 3) #objects = 114^2 e vertex = 3

teste(a, b, T)
``````

asked by anonymous 14.06.2016 / 05:54

3

I was unable to speed up your code other than by proposing a solution that runs in parallel.

``````teste2 <- function(lista1, lista2, progress = F){
lista1 = lapply(lista1, as, Class = "gpc.poly")
lista2 = lapply(lista2,  as, Class = "gpc.poly")

res <- plyr::laply(lista2, function(l2){
plyr::laply(lista1, function(l1){
area.poly(intersect(l1 , l2)) #Gargalo do código
})
},.parallel = T)

res
}
``````

Note the `.parallel = T` argument. Next you need to register the backend:

On Windows:

``````library(doSNOW)
library(foreach)
cl <- makeCluster(2)
registerDoSNOW(cl)
``````

On Linux:

``````library(doMC)
registerDoMC(2)
``````

2 is the number of cores in your processor (maybe it has more).

``````a = sim.polygons(10, 3) #no meu problema objects = 144 e vertex = 3
b = sim.polygons(20, 3) #objects = 114^2 e vertex = 3
microbenchmark::microbenchmark(
v1 = teste(a,b,F),
v2 = teste2(a,b,F),
times = 5
)

Unit: milliseconds
expr      min       lq     mean   median       uq       max neval
v1 569.4241 629.3930 819.8292 833.3761 889.4672 1177.4855     5
v2 445.0611 465.1625 548.7329 483.9004 598.9802  750.5603     5
``````

With two cores time does not reduce as much, but if your computer has 4 maybe the reduction is significant.

The problem is that the `area.poly(intersect(a , b))` function itself is slow:

``````> a <- as(a[[1]], "gpc.poly")
> b <- as(b[[1]], "gpc.poly")
> microbenchmark::microbenchmark(
+     area.poly(intersect(a , b))
+ )
Unit: milliseconds
expr    min      lq     mean median      uq    max neval
area.poly(intersect(a, b)) 2.9008 2.97925 3.146169 3.0493 3.33235 4.0275   100
``````

See that in my case it is called 200x:

``````> 10*20*3.146169
[1] 629.2338
``````

What would give this time. That is, the manipulation of the results is not adding so much time to the execution time of the function.

``````> 144^3*3.146169/1000/60
[1] 156.5735
``````

Even without capturing the results, the estimated time would be approx. 2:30 p.m.

14.06.2016 / 15:11