Regular expression strsplit

2

How do you assign E.R to separate the name of the city?

cid <- c(cidade1..SP.Brasil,cidade2...SP.Brasil,cidade3..SPDF.Brasil,cidade4...SPDF.Brasil)

In the sublime, for example, this works:

\.{3}[A-Z]{4}|\.{3}[A-Z]{2}|\.{2}[A-Z]{4}|\.{2}[A-Z]{2}

But I can not assign a variable in rstudio.

pattern <- ".{3}[A-Z]{4}|.{3}[A-Z]{2}|.{2}[A-Z]{4}|.{2}[A-Z]{2}" 
pattern <- "\.{3}[A-Z]{4}|\.{3}[A-Z]{2}|\.{2}[A-Z]{4}|\.{2}[A-Z]{2}"
pattern <- "\.{3}[A-Z]{4}|\.{3}[A-Z]{2}|\.{2}[A-Z]{4}|\.{2}[A-Z]{2}"
pattern <- regex(".{3}[A-Z]{4}|.{3}[A-Z]{2}|.{2}[A-Z]{4}|.{2}[A-Z]{2}")
pattern <- regex("\.{3}[A-Z]{4}|\.{3}[A-Z]{2}|\.{2}[A-Z]{4}|\.{2}[A-Z]{2}")
pattern <- regex("\.{3}[A-Z]{4}|\.{3}[A-Z]{2}|\.{2}[A-Z]{4}|\.{2}[A-Z]{2}")

c <- strsplit(cid, pattern, fixed = TRUE)
    
asked by anonymous 13.01.2017 / 13:42

1 answer

1

I solved the problem without regex.

cid  <-  c("cidade1..SP.Brasil", "cidade2...SP.Brasil", "cidade3..SPDF.Brasil", 
"cidade4...SPDF.Brasil")

primeiro <- function(x){
  return(x[[1]])
}

unlist(lapply(strsplit(cid, split="..", fixed=TRUE), FUN=primeiro))
[1] "cidade1" "cidade2" "cidade3" "cidade4"

I used the string ".." as the separator of the original strings. However, the strsplit command will output a list of 4 elements, where each element is a two-position vector. Since the city is always the first position of this vector, I created a function called primeiro , which will return only the first element of each of these result vectors.

The lapply and as.vector commands are used respectively to apply the primeiro function to each list element created by strsplit and to organize the final result of the algorithm into a vector.

    
13.01.2017 / 16:09