Good afternoon guys, I wonder if you can return a percentage of each attribute used in Random Forest Classifier training to show which attributes are the most deterministic.
Good afternoon guys, I wonder if you can return a percentage of each attribute used in Random Forest Classifier training to show which attributes are the most deterministic.
Reading the documentation is always a good first step.
In any case, from the manual :
feature_importances_: array of shape = [n_features]
The feature importances (the higher, the more important the feature).
Who in a free translation:
importance_of_variables_: vector with form = [number of attributes] The importance of the variables (the greater the greater the importance of the variable).
Just to make it extremely clear, you'll initialize your model (1), train it (2) and then get an importance of variables:
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier() # (1)
clf.fit(x, y) # (2)
print(clf.feature_importances_) # (3)
This paper proposes a methodology to analyze the predictions of this type of algorithm. Fortunately there is this python project that implements the methodology.
This link has a use tutorial exactly with RandomForest. I'm copying the code below so I do not run the risk of the link crashing.
import sklearn
import sklearn.datasets
import sklearn.ensemble
import numpy as np
import lime
import lime.lime_tabular
from __future__ import print_function
np.random.seed(1)
# treinar algoritmo
iris = sklearn.datasets.load_iris()
train, test, labels_train, labels_test = sklearn.model_selection.train_test_split(iris.data, iris.target, train_size=0.80)
rf = sklearn.ensemble.RandomForestClassifier(n_estimators=500)
rf.fit(train, labels_train)
# explicar as predições
explainer = lime.lime_tabular.LimeTabularExplainer(train, feature_names=iris.feature_names, class_names=iris.target_names, discretize_continuous=True)
i = np.random.randint(0, test.shape[0])
exp = explainer.explain_instance(test[i], rf.predict_proba, num_features=2, top_labels=1)