How do I filter data according to part of the characters of a variable?

3

How can I, for example, list only the comments contained in the Name variable, the word Silva?

Nome                Nota
    João Silva      9
   Pedro Souza      8
     Ana Silva      6
Isabela Cabral      10
  Paulo Santos      5

I would like you to print only one table this way:

Nome                Nota
    João Silva      9
     Ana Silva      6

I'm new here, I'm sorry for the way the problem is presented. Thank you in advance!

    
asked by anonymous 21.07.2017 / 01:49

3 answers

7

Suppose your dataset is called dados :

dados <- data.frame(Nome=c("João Silva", "Pedro Souza", "Ana Silva",
  "Isabela Cabral", "Paulo Santos"), Nota=c(9, 8, 6, 10, 5))

Use the grep function to find which lines have the word that interests you. In this case, lines 1 and 3:

grep("Silva", dados$Nome)
[1] 1 3

Only select these lines in the original dataset and your problem is solved:

dados[grep("Silva", dados$Nome), ]
        Nome Nota
1 João Silva    9
3  Ana Silva    6
    
21.07.2017 / 04:19
7

Using Marcus's answer, I want to draw attention to something that often goes unnoticed to "new" users of R , which is the data variable $ Name be of class factor . This is important in the final result, after eliminating the values that interest the levels ( levels ) of the variable are still there. See yourself with code:

dados2 <- dados[grep("Silva", dados$Nome), ]
str(dados2)
'data.frame':   2 obs. of  2 variables:
 $ Nome: Factor w/ 5 levels "Ana Silva","Isabela Cabral",..: 3 1
 $ Nota: num  9 6

dados2$Nome
[1] João Silva Ana Silva 
Levels: Ana Silva Isabela Cabral João Silva Paulo Santos Pedro Souza

If you want to delete these levels you can use the droplevels function.

dados2$Nome <- droplevels(dados2$Nome)
dados2$Nome
[1] João Silva Ana Silva 
Levels: Ana Silva João Silva

The other solution will be to start soon, when creating data.frame dados , use the stringsAsFactors argument.

dados <- data.frame(Nome=c("João Silva", "Pedro Souza", "Ana Silva",
  "Isabela Cabral", "Paulo Santos"), Nota=c(9, 8, 6, 10, 5),
  stringsAsFactors = FALSE)   ## Aqui, por defeito é TRUE

Then just use Marcus's solution.

    
21.07.2017 / 10:50
1

Using the df created by @Marcos, you can also work with tidyverse, without the difficulty presented by @Rui:

    library(tidyverse)
    library(stringr)
    dados <- tibble(Nome=c("João Silva", "Pedro Souza", "Ana Silva",
                           "Isabela Cabral", "Paulo Santos"),
                    Nota=c(9, 8, 6, 10, 5)) %>% 

      .[str_which(.$Nome,"Silva"),] 
    
30.07.2017 / 14:27