Rename the levels of a factor based on a data frame

5

Suppose I have the date frame iris , present in the memory of R:

head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Suppose also that I also have a data frame named flores , with the following structure:

flores <- data.frame(Especies=c("setosa", "virginica", "versicolor"), 
                     Nome=c("Flor 1", "Flor 2", "Flor 3"))
    Especies   Nome
1     setosa Flor 1
2  virginica Flor 2
3 versicolor Flor 3

I'd like to replace occurrences of iris$Species with flores$Nome . That is, I would like every occurrence of setosa in iris$Species to be replaced with Flor 1 ; each occurrence of virginica in iris$Species was replaced by Flor 2 ; and each occurrence of versicolor in iris$Species was replaced with Flor 3 .

Using something like if or ifelse is out of the question, because the dataset I'm working with has thousands of occurrences of different species. It would be impossible to type all the options I have to work with.

    
asked by anonymous 02.12.2017 / 20:11

2 answers

3

I would do a left_join and then delete the variable. For example:

> library(dplyr)
> flores <- data.frame(Especies=c("setosa", "virginica", "versicolor"), 
+                      Nome=c("Flor 1", "Flor 2", "Flor 3"))
> 
> iris <- left_join(iris, flores, by = c("Species" = "Especies")) %>%
+   select(-Species) %>%
+   rename(Species = Nome)
> 
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  Flor 1
2          4.9         3.0          1.4         0.2  Flor 1
3          4.7         3.2          1.3         0.2  Flor 1
4          4.6         3.1          1.5         0.2  Flor 1
5          5.0         3.6          1.4         0.2  Flor 1
6          5.4         3.9          1.7         0.4  Flor 1

Using case_when could also be an option, but not if you already have that data.frame names.

In time, there is the fct_recode function of the forcats :

    
04.12.2017 / 13:37
4

I believe the following code resolves the issue. However, I had problems with the columns involved, because they are of class factor . First, it includes the argument stringsAsFactors in the creation of the data frame flores . And then I made the column Species into character .

flores <- data.frame(Especies=c("setosa", "virginica", "versicolor"), 
                     Nome=c("Flor 1", "Flor 2", "Flor 3"),
                     stringsAsFactors = FALSE)

iris$Species <- as.character(iris$Species)

for(s in unique(iris$Species)){
    iris$Species[iris$Species == s] <- flores$Nome[flores$Especie == s]
}

iris$Species <- factor(iris$Species)    # voltar a factor

If the column Nome of flores has to be factor then you should use

iris$Species[inx] <- as.character(flores$Nome[flores$Especie == s])

within the for cycle.

    
02.12.2017 / 23:46