Split Column into Other Two Lat Long

4

I have the following spreadsheets:

  BIRDS.GENERO         SP1        SP2                       XLOCAL
  Thamnomanes      caesius    glaucus                0°32'S52°12'W 
  Thamnomanes   ardesiacus  obidensis                0°10'S51°50'W
  Thamnomanes      caesius    glaucus         00°53'44"N52°00'08"W 
  Thamnomanes   ardesiacus  obidensis         00°53'44"N52°00'08"W 
  Thamnomanes   ardesiacus  obidensis               01°40'N51°23'W
  Thamnomanes   ardesiacus  obidensis               01°40'N51°23'W
  Thamnomanes      caesius hoffmannsi   03°58'14,82"S49°52'56,37"W
  Thamnomanes      caesius hoffmannsi   03°58'14,82"S49°52'56,37"W
  Thamnomanes      caesius hoffmannsi     03°31'46,9"S51°44'05,2"W
  Thamnomanes      caesius hoffmannsi     03°31'46,9"S51°44'05,2"W

I would like to split the XLOCAL column into Latitude and Longitude. So:

 BIRDS.GENERO         SP1        SP2             XLOCAL                   Lat            Long
 Thamnomanes      caesius    glaucus                0°32'S52°12'W         0°32'S        52°12'W
 Thamnomanes   ardesiacus  obidensis                0°10'S51°50'W         0°10'S        51°50'W
 Thamnomanes      caesius    glaucus         00°53'44"N52°00'08"W     00°53'44"N     52°00'08"W
 Thamnomanes   ardesiacus  obidensis         00°53'44"N52°00'08"W     00°53'44"N     52°00'08"W 
 Thamnomanes   ardesiacus  obidensis               01°40'N51°23'W        01°40'N        51°23'W
 Thamnomanes   ardesiacus  obidensis               01°40'N51°23'W        01°40'N        51°23'W
 Thamnomanes      caesius hoffmannsi   03°58'14,82"S49°52'56,37"W  03°58'14,82"S  49°52'56,37"W 
 Thamnomanes      caesius hoffmannsi   03°58'14,82"S49°52'56,37"W  03°58'14,82"S  49°52'56,37"W
 Thamnomanes      caesius hoffmannsi     03°31'46,9"S51°44'05,2"W   03°31'46,9"S   51°44'05,2"W
 Thamnomanes      caesius hoffmannsi     03°31'46,9"S51°44'05,2"W   03°31'46,9"S   51°44'05,2"W

Searching the forum I found something like this:

dados$lat<- str_sub(dados$XLOCAL, end = 6)
head(x)
dados$long<- str_sub(dados$XLOCAL, start  = 7)

As you can see there is a large variation in the size of the information. So the above argument does not work.

Thank you in advance.

    
asked by anonymous 28.11.2018 / 17:15

2 answers

5

You can use strsplit to separate the coordinates, taking advantage of the "look back" option of regular expressions. Because strsplit returns a list, the most practical is to first throw the result to an object and then include each column in the original data.frame.

dados <- read.table(text = c(
 "BIRDS.GENERO         SP1        SP2                       XLOCAL
  Thamnomanes      caesius    glaucus                0°32'S52°12'W 
  Thamnomanes   ardesiacus  obidensis                0°10'S51°50'W
  Thamnomanes   ardesiacus  obidensis               01°40'N51°23'W
  Thamnomanes   ardesiacus  obidensis               01°40'N51°23'W"),
  header = TRUE, stringsAsFactors = FALSE
)

listaC <- strsplit(dados$XLOCAL, "(?<=N|S)", perl = TRUE)
dados$Lat  <- unlist(lapply(listaC, '[', 1))
dados$Long <- unlist(lapply(listaC, '[', 2))

> dados
  BIRDS.GENERO        SP1       SP2         XLOCAL     Lat    Long
1  Thamnomanes    caesius   glaucus  0°32'S52°12'W  0°32'S 52°12'W
2  Thamnomanes ardesiacus obidensis  0°10'S51°50'W  0°10'S 51°50'W
3  Thamnomanes ardesiacus obidensis 01°40'N51°23'W 01°40'N 51°23'W
4  Thamnomanes ardesiacus obidensis 01°40'N51°23'W 01°40'N 51°23'W
    
29.11.2018 / 12:29
2

I could not get a straightforward solution, I used the separate function of the tidyverse package twice and I made some manipulations on the resulting variables to get the result you expect.

Being dados your bank, follow code:

library(tidyverse)

dados %>% 
  separate("XLOCAL", into = c("Lat1", "Long1"), sep = "N", remove = F, extra = "drop", fill = "right") %>% 
  separate("XLOCAL", into = c("Lat2", "Long2"), sep = "S", remove = F, extra = "drop", fill = "right") %>% 
  mutate( Lat1 = paste0(Lat1, "N"), Lat2 = paste0(Lat2, "S") ) %>% 
  mutate( Long2 = ifelse(is.na(Long2), "", Long2), Long1 = ifelse(is.na(Long1), "", Long1) ) %>% 
  mutate( Lat = ifelse(Long1 == "", Lat2, Lat1), Long = paste0(Long2, Long1) ) %>% 
  select(BIRDS.GENERO:XLOCAL, Lat, Long)
  • The first separate is breaking XLOCAL , in two variables ( Lat1 and Long1 ), through the N separator;
  • The second separate is breaking XLOCAL , in two variables ( Lat2 and Long2 ), through the S separator;
  • The first mutate is including the N and S indicators, respectively, Lat1 and Lat2 , respectively, since the separate function excludes the separators used;
  • the second mutate is modifying what is NA for a blank field;
  • The third mutate is creating the variables Lat and Long using the variables "auxiliary" Lat1 , Lat2 , Long1 , Long2 ;
  • Finally, the select function is selecting the variables of interest
28.11.2018 / 19:10