In R, use the dplyr functions to find the minimum distance

2

I have an array with two numeric variables: lat and long . Like this:

> head(pontos_sub)
  id       lat      long
1  0 -22,91223 -43,18810
2  1 -22,91219 -43,18804
3  2 -22,91225 -43,18816
4  3 -22,89973 -43,20855
5  4 -22,89970 -43,20860
6  5 -22,89980 -43,20860

Now I make a round to round with 3 decimal digits:

pontos_sub$long_r <- round(pontos_sub$long, 3)
pontos_sub$lat_r <- round(pontos_sub$lat, 3)

> head(pontos_sub)
  id       lat      long  long_r   lat_r
1  0 -22,91223 -43,18810 -43,188 -22,912 
2  1 -22,91219 -43,18804 -43,188 -22,912
3  2 -22,91225 -43,18816 -43,188 -22,912
4  3 -22,89973 -43,20855 -43,209 -22,900 
5  4 -22,89970 -43,20860 -43,209 -22,900
6  5 -22,89980 -43,20860 -43,209 -22,900

Now I want to use the package dplyr to find, grouped by each long_r lat_r and using the function distVincentyEllipsoid, the minimum distance to all lat long of the corresponding group. Something like this:

> newdata <- pontos_sub %>% 
               group_by(long_r,lat_r) %>% 
               summarise(min_long = special_fun(arg), 
                         min_lat = special_fun(arg))

What would result something like this:

> head(newdata)
  long_r   lat_r   min_long  min_lat
1 -43,188 -22,912   xxxxxx   xxxxxxx
4 -43,209 -22,900   xxxxxx   xxxxxxx

Finally, I would like to know if this is the fastest way, because I have thousands of lines ... is there any other way to do this very fast?

    
asked by anonymous 07.04.2017 / 05:58

1 answer

1

I've tried to do something similar a while ago. I made it the way it is below, it was the best I could.

library(dplyr)
library(tibble)
library(tidyr)

pontos_sub <- tribble(
  ~id, ~lat, ~long, ~long_r, ~lat_r,
  0, -22.91223, -43.18810, -43.188, -22.912, 
  1, -22.91219, -43.18804, -43.188, -22.912,
  2, -22.91225, -43.18816, -43.188, -22.912,
  3, -22.89973, -43.20855, -43.209, -22.900, 
  4, -22.89970, -43.20860, -43.209, -22.900,
  5, -22.89980, -43.20860, -43.209, -22.900
)

dist <- pontos_sub %>% 
  dplyr::select(long_r, lat_r) %>% 
  dist() %>% 
  as.matrix()

dist %>% 
  tibble::as_tibble() %>% 
  dplyr::mutate(from = as.numeric(pontos_sub$id)) %>%
  tidyr::gather(to, dist, -from)

# A tibble: 36 × 3
    from    to       dist
   <dbl> <chr>      <dbl>
1      0     1 0.00000000
2      1     1 0.00000000
3      2     1 0.00000000
4      3     1 0.02418677
5      4     1 0.02418677
6      5     1 0.02418677
7      0     2 0.00000000
8      1     2 0.00000000
9      2     2 0.00000000
10     3     2 0.02418677
# ... with 26 more rows
    
16.04.2017 / 00:01