I'm learning machine learning techniques to predict (numeric) sheet size values from multiple (numeric) predictors. However, leaf sizes are conditioned to the way of life, (trees or grams), which are not balanced. At the moment, I'm creating data separation using the values "sheet size" (the variable I want to predict) and generating separate models for each class. My question is: do I need to create separate models for each class, or is there any way I can separate training and test data into existing classes and generate a single one in the model that generates sheet size prediction, taking into account class (life_form) (and if someone has a tip ~ for someone who has never dealt with ml before ~ how to deal with the fact that they are not balanced).
library(caret)
# Parte dos dados
> dput(head(df))
structure(list(tam_folha = c(4L, 5L, 3L, 1L, 2L), forma_vida = structure(c(1L,2L, 1L, 2L, 1L), .Label = c("arvore", "grama"), class = "factor"),
X1036 = c(0.349, 0.342, 0.383, 0.325, 0.309), X1037 = c(0.349,
0.342, 0.383, 0.325, 0.309), X1038 = c(0.349, 0.342, 0.383,
0.325, 0.309), X1039 = c(0.349, 0.342, 0.383, 0.325, 0.309
), X1040 = c(0.349, 0.342, 0.383, 0.325, 0.31), X1041 = c(0.349,
0.342, 0.383, 0.326, 0.31)), .Names = c("X", "Y", "X1036","X1037", "X1038", "X1039", "X1040", "X1041"), row.names = c(NA,5L), class = "data.frame")
#Filtrando por classes
arvores = df %>% dplyr::filter(forma_vida=="arvore")
# Data partition
index <- createDataPartition(arvores$tam_folha, p = 0.7, list = FALSE)
train_data <- arvores[index, ]
test_data <- arvores[-index, ]
controle = trainControl(method ="cv",number= 10, repeat=5, selectionFunction = "oneSE")
mod1 <- train(tam_folha ~ ., data = train_data,
method = "pls",
metric = "RMSE",
tuneLength = 4,
trControl = controle)
##repete para o fator::gramas