Improve performance for predictive model creation

1

I'm creating a predictive model in R, using the library caret. When I squeeze in R it takes a long time, and still some errors. In comparison, I run the same base on the Weka in a matter of minutes I already get the result.

I have already modified the variables to integer, and even then it did not help much.

I have tried to use it in parallel, but it did not work too well.

Would you like to know what performance is linked in this case? What are the factors that most influence poor performance in creating a predictive model?

    
asked by anonymous 21.02.2017 / 18:30

1 answer

1

There may be a number of reasons for slowing down:

  • Slow algorithm. randomForest is not the fastest package: try using ranger or Rborist . Font . xgboost is also fast to damn and doing some tweaks to set random forest for it.
  • The caret is tuning the parameters. Pass only a combination of hyper-parameters using the tuneGrid argument.
  • It is very likely that the algorithms in R perform less than Weka (in C), but you can use Weka in R (search for the RWeka package)

Difficult to say why errors happen without seeing your data. I would kick it because it has some of its variables that have a rare class and when you do cross-validation, some of the folds go without it.

Always try to look for some R package that uses some algorithm in C / C ++ to train the models. In this part of machine-learning, R should be considered only as an interface, to use algorithms from several sources in an easier and usually standardized way.

    
21.02.2017 / 19:24