Creating Dummy Variables

3

I'm trying to turn every variable in my database into dummy variables:

>dados
  X1 X2 X3
1  1  3  1
2  3  2  1
3  3  2  1
4  2  3  2
5  2  3  3

I'm trying to create binary vectors for this. But, I can not do it right. Since I have 3 categories per variable, the number of dummy variables is: k-1 dummy variables . This would result in 2 artificial variables per variable.

What I tried was this:

library(mlr)
createDummyFeatures(dados,cols=NULL)

   1 2 3
1  1 0 0
2  0 0 1
3  0 0 1
4  0 1 0
5  0 1 0
6  0 0 1
7  0 1 0
8  0 1 0
9  0 0 1
10 0 0 1
11 1 0 0
12 1 0 0
13 1 0 0
14 0 1 0
15 0 0 1

Why does this return me 3 variables per variable (since k-1 dummy variables should be two). Also, they are in the same column! How do I solve these problems? They should look like this:

   a b    c d    e f 
1  1 0    0 0    1 0
2  0 0    0 1    1 0
3  0 0    0 1    1 0
4  0 1    0 0    0 1
5  0 1    0 0    0 0
    
asked by anonymous 05.11.2018 / 14:28

1 answer

4

The closest I got to the result you expect was using the dummyVars function of the caret package. The result was not the same because the example you gave does not have the number 1 in the X2 column, so it is omitted from the final result.

First you have to construct the variables as a factor:

dados <- data.frame(X1 = as.factor(c(1,3,3,2,2)), X2 = as.factor(c(3,2,2,3,3)), X3 = as.factor(c(1,1,1,2,3)))

Then I modified the reference of the variables to arrive at what you expect:

dados$X1 <- relevel(dados$X1, ref = 3)
dados$X2 <- relevel(dados$X2, ref = 3)
dados$X3 <- relevel(dados$X3, ref = 3)

Finally, I created the dummy variables with the caret package:

library(caret)
dummy <- dummyVars(~ ., data = dados, fullRank = T)

The result is:

predict(dummy, dados)

  X1.1 X1.2 X2.3 X3.1 X3.2
1    1    0    1    1    0
2    0    0    0    1    0
3    0    0    0    1    0
4    0    1    1    0    1
5    0    1    1    0    0
    
05.11.2018 / 17:34