Operation with lists of very large sizes

5

I have a code that calculates the area of the intersection between two polygons and for this I use lists to save the coordinates of the vertices of the polygons, however they are many polygons and it takes an average of 6h to run the whole code. Do you know any list operations that can help minimize the procedure?

My code

require(dplyr); require(rgeos); require(sp)
sim.polygons = function(objects, vertex){
  polygons = NULL
  for(i in 1:objects) polygons[[i]] = matrix(runif(vertex*2), ncol = 2)
  return(polygons)
}

teste = function(lista1, lista2, progress = F){
  lista1 = lapply(lista1, as, Class = "gpc.poly")
  lista2 = lapply(lista2,  as, Class = "gpc.poly")
  res = matrix(0, nrow = length(lista2), ncol = length(lista1))
  for(k in 1 : length(lista1)){
    for(l in 1 : length(lista2)){
      res[l, k] = area.poly(intersect(lista1[[k]], lista2[[l]])) #Gargalo do código
    }
    if(progress == T) print(k)
  }
  res
}
#exemplo
a = sim.polygons(50, 3) #no meu problema objects = 144 e vertex = 3
b = sim.polygons(100, 3) #objects = 114^2 e vertex = 3

teste(a, b, T)
    
asked by anonymous 14.06.2016 / 05:54

1 answer

3

I was unable to speed up your code other than by proposing a solution that runs in parallel.

teste2 <- function(lista1, lista2, progress = F){
  lista1 = lapply(lista1, as, Class = "gpc.poly")
  lista2 = lapply(lista2,  as, Class = "gpc.poly")

  res <- plyr::laply(lista2, function(l2){
    plyr::laply(lista1, function(l1){
      area.poly(intersect(l1 , l2)) #Gargalo do código
    })
  },.parallel = T)

  res
}

Note the .parallel = T argument. Next you need to register the backend:

On Windows:

library(doSNOW)
library(foreach)
cl <- makeCluster(2)
registerDoSNOW(cl)

On Linux:

library(doMC)
registerDoMC(2)

2 is the number of cores in your processor (maybe it has more).

a = sim.polygons(10, 3) #no meu problema objects = 144 e vertex = 3
b = sim.polygons(20, 3) #objects = 114^2 e vertex = 3
microbenchmark::microbenchmark(
  v1 = teste(a,b,F),
  v2 = teste2(a,b,F),
  times = 5
)

Unit: milliseconds
 expr      min       lq     mean   median       uq       max neval
   v1 569.4241 629.3930 819.8292 833.3761 889.4672 1177.4855     5
   v2 445.0611 465.1625 548.7329 483.9004 598.9802  750.5603     5

With two cores time does not reduce as much, but if your computer has 4 maybe the reduction is significant.

The problem is that the area.poly(intersect(a , b)) function itself is slow:

> a <- as(a[[1]], "gpc.poly") 
> b <- as(b[[1]], "gpc.poly")
> microbenchmark::microbenchmark(
+     area.poly(intersect(a , b)) 
+ )
Unit: milliseconds
                       expr    min      lq     mean median      uq    max neval
 area.poly(intersect(a, b)) 2.9008 2.97925 3.146169 3.0493 3.33235 4.0275   100

See that in my case it is called 200x:

> 10*20*3.146169 
[1] 629.2338

What would give this time. That is, the manipulation of the results is not adding so much time to the execution time of the function.

> 144^3*3.146169/1000/60
[1] 156.5735

Even without capturing the results, the estimated time would be approx. 2:30 p.m.

    
14.06.2016 / 15:11