Search strings in the R language

1

I need to search in a df column where the text may not be accurate. Example: df$titulo=="SE" & df$titulo=="projeto de pesquisa" does not find anything. I've already tried using like instead of = , already tried to use df$titulo == "%projeto de pesquisa%" , but it does not work. Ah! The subset function does not bring anything either.

Just so you can better understand what I'm saying, in sql there is a like command that does search for part of the string instead of = .

    
asked by anonymous 05.11.2017 / 02:34

2 answers

1

I was able to do the following:

x <- agrep(pattern="projeto de pesquisa", df$titulo, ignore.case = TRUE, 
  value = TRUE, fixed = TRUE)  

ignore.case ignores case and value returns the value of the corresponding string.

    
05.11.2017 / 15:30
2

For a simple match, you can use the str_subset function of the stringr package:

library(stringr)
texto <- c("abc projeto de pesquisa cdf", "123 projeto de pesquisa", "progeto de pesquisa")
str_subset(texto, pattern = regex("projeto de pesquisa", ignore_case = T))

Note that the third case, however, which contains a Portuguese error is not detected. The agrep you are using is more liberal in that sense as it will make an approximate match, using Levenshtein's distance and can capture the third case, if that is what you want.

    
07.11.2017 / 06:02