R - match and add string

3
n <- c("alberto queiroz souza","bernardo josé silva","josé césar pereira","alberto, q-s.","alberto, queiroz souza","alberto, q. s.","alberto, q c", "bernardo, j. s.", "bernardo, j. silva", "josé, c. p.", "josé, c. pereira")

I have to find every element of vector n, in df:

df <- data.frame(Titulo.1 = c("ALBERTO QUEIROZ SOUZA (ALBERTO, Q-S.) - ATUA NA EMPRESA.","B. J SILVA (BERNARDO, J. SILVA)", "JOSÉ CÉSAR PEREIRA (JOSÉ, C. P.)", "LENILTON FRAGOSO (FRAGOZO, LENILTON)","ALKMIM, MARCIO"),
                  Titulo.2 = c("BERNARDO JOSÉ SILVA (BERNARDO, J. S.)","ALBERTO QUEIROZ SOUZA (ALBERTO, QUEIROZ SOUZA)","JOSÉ CÉSAR PEREIRA (JOSÉ, C. PEREIRA)","LENILTON FRAGOSO (FRAGOZO, LENILTON)","ALKMIM, MARCIO"),
                  Titulo.3 = c("LENILTON FRAGOSO (FRAGOZO, L)","BERNARDO JOSÉ SILVA (BERNARDO, J. S.) - ATUA NA EMPRESA","ALBERTO QUEIROZ SOUZA (ALBERTO, Q. S.)","JOSÉ CÉSAR PEREIRA (J. C. P.)","ALKMIM, MARCIO"),
                  Titulo.4 = c("JOSÉ CÉSAR PEREIRA (JOSÉ, CÉZAR PEREIRA)","LENILTON FRAGOSO (FRAGOZO, LENILTON) - ATUA NA FIOCRUZ","ALKMIM, MARCIO","ALBERTO (ALBERTO, Q C)","BERNARDO JOSÉ SILVA (B, J. S.)"),
                  Titulo.5 = c("BERNARDO JOSÉ SILVA (BERNARDO, JS)","JOSÉ CÉSAR PEREIRA (JOSÉ, C. PEREIRA) - ATUA NA FIOCRUZ","LENILTON FRAGOSO (FRAGOZO, L.)","ALKMIM, MARCIO","ALBERTO QUEIROZ SOUZA (ALBERTO, Q-S.)"),
                 stringsAsFactors = FALSE)

When I find it, I should add "- acts in the company", thus "josé, cp - acts in the company", for example.

But if the match in df already has the "- acts in the company", obviously does not need.

I'm trying the match first, with something like this:

for (x in n) {
  result <- sapply(df, gsub, pattern = x, ...)
  #ou
  result <- sapply(df, str_replace, pattern = x, ...)
}

But it's difficult.

    
asked by anonymous 16.08.2016 / 22:52

1 answer

1

The following code performs the following: For each item in each column, retrieve the names, look for them in the n vector, for the found names check if they already act in the company, and decide to add that text in the negative case. As already mentioned in the comments, for better results you have to clean your bank.

textm<-"ATUA NA EMPRESA"
ndf<-as.data.frame(lapply(df,function(nc){#nc=df[,1]
  nct=nc
  ncm<-sapply(nc,function(nx)
    tolower(unlist(strsplit(nx," (",fixed=T))[1]) )
  enc=ncm%in%n
  emp=grepl(textm,nc[enc])
  nct[enc]<-ifelse(emp,nc[enc],paste(nc[enc]," - ",textm,".",sep=""))
  nct
  })
,stringsAsFactors = FALSE)
ndf[,1]

[1] "ALBERTO QUEIROZ SOUZA (ALBERTO, Q-S.) - ATUA NA EMPRESA."
[2] "B. J SILVA (BERNARDO, J. SILVA)"                         
[3] "JOSÉ CÉSAR PEREIRA (JOSÉ, C. P.) - ATUA NA EMPRESA."     
[4] "LENILTON FRAGOSO (FRAGOZO, LENILTON)"                    
[5] "ALKMIM, MARCIO"   
    
19.08.2016 / 22:01