Let's get this extracted URL
/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW
And I want only the snippet that begins with a1G
, does anyone know how I can only get this snippet?
Let's get this extracted URL
/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW
And I want only the snippet that begins with a1G
, does anyone know how I can only get this snippet?
You can do this using the stringr
package and regular expressions.
In your case, I would do so:
s <- "/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW"
stringr::str_extract(s, "a1G\S+\s*")
[1] "a1G57000003DE4QEAW"
This code works even if s
is a vector, so it would work in data.frame
as follows:
df$extrair <- stringr::str_extract(df$url, "a1G\S+\s*")
Note that if you do not have the stringr
package installed, you will need to install it using the install.packages("stringr")
command.
Extracting part of a string using only the base
package is quite annoying, but possible. I chose a simpler regular expression than Daniel's, since you were not very specific. It would look like this:
> s <- "/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW"
> regmatches(s, gregexpr("a1G.+", s))
[[1]]
[1] "a1G57000003DE4QEAW"
See that the result is a list, which will contain an element for each string of the s
vector, with all occurrences of the regular expression. If you just want a vector as output, you can use unlist:
> s <- c("/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW", "abcsda1G000")
> regmatches(s, gregexpr("a1G.+", s))
[[1]]
[1] "a1G57000003DE4QEAW"
[[2]]
[1] "a1G000"
> unlist(regmatches(s, gregexpr("a1G.+", s)))
[1] "a1G57000003DE4QEAW" "a1G000"