How to decode many columns from a data frame of R

6

I have a data frame with more than 300 columns that are categorical but are encoded as numeric. Each of these columns has its own "type", that is, it has its own coding table. My problem is to create a new data frame with decoded variables.

I have loaded the following data frames:

  • The main data frame called "data", which has 347 columns that I want to decode.
  • an auxiliary data frame named "data_vars" with: name (variable.name) and "type" (data.type) of all main df variables
  • an auxiliary data frame called "codes": with "type" (data.type), possible codes for the respective "type" and the meaning of each code

I'm trying to use dplyr to try to make it easier. What I've been able to do so far is:

# pego uma das variáveis do df principal 
variavel <- "abc"
# busco no df "dados_vars" qual é o tipo desta variável
tipo.variavel <- as.character(dados_vars[dados_vars$variable.name == variavel, "data.type"])
# filtro no df "codes" os códigos específicos que esta variável pode ter
codigos <- codes %>% filter(data.type==tipo.variável) %>% select(value,content)
# crio um novo data frame com esta variável decodificada
novos.dados <- mutate(dados, var1=factor(var1,label=codigos$content,levels=codigos$value))

Now, how do I apply this procedure to all columns in the main df?

    
asked by anonymous 15.04.2015 / 19:07

2 answers

0

I ended up adopting the solution below, following the given tips.

for (i in colnames(dados)){
    tipo.variavel <- as.character(dados_vars[dados_vars$variable.name == i, "data.type"])
    fatores.variavel <- subset(codes,toupper(data.type)==toupper(tipo.variavel), c("value","content"))
    dados[,paste0(i,".new")] <- factor(dados[,i],labels=fatores.variavel$content,levels=fatores.variavel$value)
}

Thanks for the help.

    
16.04.2015 / 18:03
2

A solution using the base package:

dados <- data.frame(replicate(10, sample(1:3, 10, rep = T)))
dados_vars <- data.frame(variable.name = paste0('X', 1:10), data.type = sample(1:4, 10, rep = T))
codes <- data.frame(tipo = rep(1:4, each = 3), value = rep(1:3, 4), code = letters[1:12])

for (i in colnames(dados)) {
    tipo.atual <- dados_vars[dados_vars$variable.name == i, 'data.type']
    dados[, i] <- factor(dados[, i], levels = subset(codes, tipo == tipo.atual)$value, labels = subset(codes, tipo == tipo.atual)$code)
}
    
15.04.2015 / 22:30