Filter Different texts in different positions in R

5

Good afternoon. I have the following data:

NOME  <- c("MARIA 1001", "MARIA 1002A", "JOSE 1003B", "PEDRO 1003", "CARLOS 1019J", “ANTONIO 50”, “MARIA 80”)
VALOR <- c(10, 20, 30, 40, 50, 60, 70)
dados <- data.frame(NOME, VALOR)

I need to filter lines that are between 1001 to 1019, regardless of their position (beginning, middle, or end of text). My expected result is that you exclude only the "ANTONIO 50" and "MARIA 80" lines. I would like help how to proceed to make this filter. Thank you.

    
asked by anonymous 28.12.2017 / 15:14

2 answers

4

I would do so:

library(stringr)
library(dplyr)

dados %>%
  filter(str_extract(NOME, "\d{1,}") %in% 1001:1019)

The function str_extract extracts a pattern from a string using regex. In this case, the default is: \d{1,} , that is, at least 1 integer.

    
28.12.2017 / 18:39
3

Try the following. First we use gsub to get only the numbers in dados$NOME . Then we filter with a logical index.

num <- as.numeric(gsub("[^[:digit:]]", "", dados$NOME))
dados2 <- dados[1001 <= num & num <= 1019, ]
rm(num)    # já não é preciso

dados2 
#          NOME VALOR
#1   MARIA 1001    10
#2  MARIA 1002A    20
#3   JOSE 1003B    30
#4   PEDRO 1003    40
#5 CARLOS 1019J    50
    
28.12.2017 / 16:00