Search for an expression in several elements of a list

9

Guys, I have a problem. I have 200 spreadsheets with some data from a search, and I'm importing it into R, and because they have different columns, I assign each element in my list a different spreadsheet. I need to look up a name that can be in any of the worksheet and that returns me in which element of the list that that name is. How can I do this?

For example, find out where José da Silva is:

df1 <- data.frame(nome = c("José da Silva", "Maria da Silva"),
              idade = c(45, 54))
df2 <- data.frame(nome_completo = c("Mauro Pereira", "João Paulo"),
              idade = c(30, 12))

lista <- list()
lista[[1]] <- df1
lista[[2]] <- df2
    
asked by anonymous 17.09.2018 / 22:15

3 answers

8

Using the function which within lapply

lapply(lista, function(x) which(x == "José da Silva"))
[[1]]
[1] 1

[[2]]
integer(0)

This is an option to search for an exact term, as in your example "José da Silva"

    
17.09.2018 / 22:47
5

I would do it as follows:

library(purrr)

buscar_nome <- function(lista, nome) {
  map_lgl(lista, ~any(nome %in% .x[[1]])) %>% which()  
}

# > buscar_nome(lista, "Maria da Silva")
# [1] 1
# > buscar_nome(lista, "Mauro Pereira")
# [1] 2

An important assumption I am making is that the searched name is in the first column of the data.frame ... This can be modified as follows to search all columns (but losing efficiency).

buscar_nome <- function(lista, nome) {
  map_lgl(lista, ~any(nome %in% as.matrix(.x))) %>% which()  
} 
    
18.09.2018 / 00:14
5

I made a small change in your data to increase the number of cases:

df1 <- data.frame(nome = c("José da Silva", "Maria da Silva"),
              idade = c(45, 54))
df2 <- data.frame(nome_completo = c("Mauro Pereira", "João Paulo", "João Pedro"),
              idade = c(30, 12, 1))
df3 <- data.frame(renda = c(1, 2, 3),
              idade = c(3, 2, 9),
              nome_do_cabra = c("Antônio Augusto", "João Marcos", "João Ivo"))

lista <- list()
lista[[1]] <- df1
lista[[2]] <- df2
lista[[3]] <- df3

See if this function solves your problem. Not very efficient (loop inside loop ... etc), but I think it does work.

procura_nome <- function(x, pattern){
    list_result <- list()
    element_list_i = 1
    for(j in 1:length(x)){
            for(k in 1:ncol(x[[j]])){
                    linhas_result <- grep(x = x[[j]][,k], pattern = pattern)
                    if(length(linhas_result) > 0){
                            list_result[[element_list_i]] <- cbind(j, k, linhas_result)
                            element_list_i = element_list_i + 1
                    }
            }
    }
    if(length(list_result) >0 ){
            matrix_result <- purrr::reduce(list_result, rbind)
            df_result     <- as.data.frame(matrix_result)
            names(df_result) <- c("numero_lista", "numero_coluna", "numero_linha")
            return(df_result)
    }else{
            return(NULL)
    }
}

Because the string search function used internally is grep , you can search for names in a non-exact way. It is possible to improve, of course, to leave case-insensitive, ignore accents etc.

The result is a data.frame with a column indicating the number of the element within the list, another indicating the column of the data.frame and a third indicating the line, such as the following:

procura_nome(lista, "João")
###   numero_lista numero_coluna numero_linha
### 1            2             1            2
### 2            2             1            3
### 3            3             3            2
### 4            3             3            3
    
17.09.2018 / 22:52