How to remove unused categories in the database

5

Suppose I have the following database:

df <- data.frame(categorias=c("A","B","C","D","E"),
                 valores=seq(1:5))

When I make a subset of that date frame the categories I've removed continue.

subdf <- subset(df, valores <= 3)
levels(subdf$categorias)
[1] "A" "B" "C" "D" "E"
    
asked by anonymous 02.04.2014 / 19:07

1 answer

4

You can use the droplevels

subdf <- droplevels(subset(df, valores <= 3))

Result:

levels(subdf$categorias)
[1] "A" "B" "C"

The advantage is that this works for more than one factor variable at the same time. For example, if your data.frame is:

df <- data.frame(categorias=c("A","B","C","D","E"),
                 categorias2 = c("F", "G", "H", "I", "J"),
                 valores=seq(1:5),
                 valores2=rnorm(5))

If you only do subset both categorias and categorias2 would get more levels . With subdf <- droplevels(subset(df, valores <= 3)) this is resolved for all columns of factor .

    
02.04.2014 / 19:10