Sort column of a date frame in R

2

I have a date frame with 89000 lines and in one of the columns the relationship with the employee appears. I need to split in 4 classes , namely:

  • Class 1 - Conjugue / Children
  • Class 2 - Mother / Father
  • Class 3 - Brothers
  • OTHER - Other Relatives

I need to create a column in the data frame that inserts the employee's kin class (I need to keep the original parentage in the base). I did with a set of% nested%, but I would like to know if there is a more "elegant" solution.

ifelse(base.dados$Parentesco %in% classe1, base.dados$CLASSE <- "CLASSE 1",
                              ifelse(base.dados$Parentesco %in% classe2, base.dados$CLASSE <- "CLASSE 2",
                                     ifelse(base.dados$Parentesco %in% classe3, base.dados$CLASSE <- "CLASSE 3", "OUTRAS")))
    
asked by anonymous 21.09.2017 / 14:37

2 answers

2

For me, the most elegant way would be to create a function that simplifies the string of ifelse , and even generalize the function transformation to other situations. Example:

classes_parentescos <- list("CLASSE 1"=c("conjuge", "filho"), 
                "CLASSE 2"=c("mae", "pai"), 
                "CLASSE 3"=c("outros")
                )

get_class_name <- function(x, classes=classes_parentescos){
        pos <- grep(x, classes)
        names(classes[pos])
}

base.dados$CLASSE <- sapply(base.dados$Parentesco, get_class_name)
    
22.09.2017 / 00:12
2

As we do not have an example of base.dados , I created a data.frame . If you want to avoid so many ifelse you can do something like this.

set.seed(6399)  # Torma o código reprodutível

classe1 <- c("Conjugue", "Filho", "Filha")
classe2 <- c("Mãe", "Pai")
classe3 <- c("Irmão", "Irmã")
classe4 <- c("Tio", "Tia", "Avô", "Avó")

base.dados <- data.frame(
    ID = 1:20,
    Parentesco = sample(c(classe1, classe2, classe3, classe4), 20, TRUE)
)
base.dados

base.dados$CLASSE <- "OUTRAS"
base.dados$CLASSE[base.dados$Parentesco %in% classe1] <- "CLASSE 1"
base.dados$CLASSE[base.dados$Parentesco %in% classe2] <- "CLASSE 2"
base.dados$CLASSE[base.dados$Parentesco %in% classe3] <- "CLASSE 3"

If you have NA values in the base, you should use which in the logical index. The first line stays, only the others change.

base.dados$CLASSE <- "OUTRAS"
base.dados$CLASSE[which(base.dados$Parentesco %in% classe1)] <- "CLASSE 1"

And the same for the other classes.

    
21.09.2017 / 18:03