The caret
defaults to tuning some hyperparameters of each model. He tries to do this in a clever way, but that is not always the right fit for his problem. rpart
adjusts the template exactly as you defined it.
The caret
is not very clear with this, and sometimes it causes confusion ...
In this case, for rpart
will tunar the hyperparameter cp
(complexity). It decides a grid to test according to the following function:
> getModelInfo("rpart")[[1]]$grid
function (x, y, len = NULL, search = "grid")
{
dat <- if (is.data.frame(x))
x
else as.data.frame(x)
dat$.outcome <- y
initialFit <- rpart(.outcome ~ ., data = dat, control = rpart.control(cp = 0))$cptable
initialFit <- initialFit[order(-initialFit[, "CP"]), , drop = FALSE]
if (search == "grid") {
if (nrow(initialFit) < len) {
tuneSeq <- data.frame(cp = seq(min(initialFit[, "CP"]),
max(initialFit[, "CP"]), length = len))
}
else tuneSeq <- data.frame(cp = initialFit[1:len, "CP"])
colnames(tuneSeq) <- "cp"
}
else {
tuneSeq <- data.frame(cp = unique(sample(initialFit[,
"CP"], size = len, replace = TRUE)))
}
tuneSeq
}
This function basically:
- set a template with all parameters equal to the default rpart except
cp
(complexity), using cp = 0.
- takes the item
cptable
returned, which by definition is:
cptable : a matrix of information on the optimal prunings based on a
complexity parameter.
- set a template for a string of
cp
's according to the argument tuneLength
of the function train
.
This behavior can be changed. Read here for more information: link