R - match and replace string

2

I have this vector:

n <- c("alberto queiroz souza (alberto, q-s.)", 
       "alberto queiroz souza (alberto, queiroz souza)", 
       "alberto queiroz souza (alberto, q. s.)", 
       "alberto queiroz souza (alberto, q c)", 
       "bernardo josé silva (bernardo, j. s.)", 
       "bernardo josé silva (bernardo, j. silva)", 
       "josé césar pereira (josé, c. p.)", 
       "josé césar pereira (josé, c. pereira)")

For each element I would like to separate the name from the parentheses.

n <- str_split_fixed(as.character(n), " \(", 2)

n <- c(strsplit(as.character(n), "\)$"))

I do not know how to do this split better

transforming into another vector with non-duplicate elements.

result would look like this:

result <- c("alberto queiroz souza", 
            "bernardo josé silva", 
            "josé césar pereira", 
            "alberto, q-s.", 
            "alberto, queiroz souza",
            "alberto, q. s." ...... )
    
asked by anonymous 16.08.2016 / 17:24

2 answers

3

Try something similar:

ns<-sapply(n,function(nx)
  unlist(strsplit(nx," (",fixed=T))
  )
ns<-t(unique(ns))
row.names(ns)<-NULL
res<-apply(ns,2,unique)
res[[2]]<-gsub("\)","",res[[2]])
unlist(res)

[1] "alberto queiroz souza"  "bernardo josé silva"    "josé césar pereira"    
 [4] "alberto, q-s."          "alberto, queiroz souza" "alberto, q. s."        
 [7] "alberto, q c"           "bernardo, j. s."        "bernardo, j. silva"    
[10] "josé, c. p."            "josé, c. pereira"
    
16.08.2016 / 18:28
0

I would do it this way:

library(magrittr)
library(stringr)

antes_parenteses <- n %>%
  str_extract_all(".{1,}\(") %>%
  str_replace_all(fixed(" ("), "")

parenteses <- n %>%
  unlist() %>%
  str_extract_all("\(.{1,}\)") %>%
  str_replace_all(fixed("("), "") %>%
  str_replace_all(fixed(")"), "")

resultado <- c(antes_parenteses, parenteses) %>% unique()

Instead of doing split , I'm using regular expressions to extract the information.

> resultado
[1] "alberto queiroz souza"  "bernardo josé silva"    "josé césar pereira"     "alberto, q-s."         
[5] "alberto, queiroz souza" "alberto, q. s."         "alberto, q c"           "bernardo, j. s."       
[9] "bernardo, j. silva"     "josé, c. p."            "josé, c. pereira"
    
16.08.2016 / 18:30