different results using rpart and caret

2

Hello,

I'm testing some regression models and I do not quite understand one thing: I used the rpart package rpart, and then I used the train with rpart method of the caret package

resultRPart <- rpart(EVADIU ~ ., data = data.rose)
resultCaret <- train(EVADIU ~ ., data = data.rose, method = "rpart")

I hoped the two would give the same result (precision, recall, etc.) but that's not the case

the first one gave

  

precision: 0.599

     

recall: 0.412

the second

  

precision: 0.1439

     

recall: 0.6759

Is this normal or am I comparing oranges with bananas here?

    
asked by anonymous 21.08.2017 / 15:55

1 answer

2

The caret defaults to tuning some hyperparameters of each model. He tries to do this in a clever way, but that is not always the right fit for his problem. rpart adjusts the template exactly as you defined it.

The caret is not very clear with this, and sometimes it causes confusion ...

In this case, for rpart will tunar the hyperparameter cp (complexity). It decides a grid to test according to the following function:

> getModelInfo("rpart")[[1]]$grid
function (x, y, len = NULL, search = "grid") 
{
    dat <- if (is.data.frame(x)) 
        x
    else as.data.frame(x)
    dat$.outcome <- y
    initialFit <- rpart(.outcome ~ ., data = dat, control = rpart.control(cp = 0))$cptable
    initialFit <- initialFit[order(-initialFit[, "CP"]), , drop = FALSE]
    if (search == "grid") {
        if (nrow(initialFit) < len) {
            tuneSeq <- data.frame(cp = seq(min(initialFit[, "CP"]), 
                max(initialFit[, "CP"]), length = len))
        }
        else tuneSeq <- data.frame(cp = initialFit[1:len, "CP"])
        colnames(tuneSeq) <- "cp"
    }
    else {
        tuneSeq <- data.frame(cp = unique(sample(initialFit[, 
            "CP"], size = len, replace = TRUE)))
    }
    tuneSeq
}

This function basically:

  • set a template with all parameters equal to the default rpart except cp (complexity), using cp = 0.
  • takes the item cptable returned, which by definition is:
  

cptable : a matrix of information on the optimal prunings based on a   complexity parameter.

  • set a template for a string of cp 's according to the argument tuneLength of the function train .

This behavior can be changed. Read here for more information: link

    
21.08.2017 / 18:14