How to extract a specific excerpt from a string

4

Let's get this extracted URL

/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW

And I want only the snippet that begins with a1G , does anyone know how I can only get this snippet?

    
asked by anonymous 26.02.2016 / 21:14

2 answers

2

You can do this using the stringr package and regular expressions.

In your case, I would do so:

s <- "/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW"
stringr::str_extract(s, "a1G\S+\s*")
[1] "a1G57000003DE4QEAW"

This code works even if s is a vector, so it would work in data.frame as follows:

df$extrair <- stringr::str_extract(df$url, "a1G\S+\s*")

Note that if you do not have the stringr package installed, you will need to install it using the install.packages("stringr") command.

    
26.02.2016 / 21:42
1

Extracting part of a string using only the base package is quite annoying, but possible. I chose a simpler regular expression than Daniel's, since you were not very specific. It would look like this:

> s <- "/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW"
> regmatches(s, gregexpr("a1G.+", s))
[[1]]
[1] "a1G57000003DE4QEAW"

See that the result is a list, which will contain an element for each string of the s vector, with all occurrences of the regular expression. If you just want a vector as output, you can use unlist:

> s <- c("/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW", "abcsda1G000")
> regmatches(s, gregexpr("a1G.+", s))
[[1]]
[1] "a1G57000003DE4QEAW"

[[2]]
[1] "a1G000"

> unlist(regmatches(s, gregexpr("a1G.+", s)))
[1] "a1G57000003DE4QEAW" "a1G000"     
    
27.02.2016 / 02:31