Lab 3: Classification (cont.)

Random Forests

Random Forests, implemented through the RandomForestClassifier in the sklearn.ensemble package, are one of the most powerful classification technique, simple and easy to apply.

It trains a set of n decision trees, that are combined in an ensemble of n_estimators. Each tree, however, is trained over a different subset of the original training data, first by choosing a subset of k variables describing the data, with k determined by the max_features parameter. Beside many other parameters we can choose the maximum size of each tree, through the max_depth parameter.

Next, we can see the results achieved by a set of parameters combinations.

After the plot you can see the parameters for which the best results were achieved. So let's see its performance, in that context in terms of other metrics.

Random forests have the particularity of providing the importance of each variable in the global model. In order to reach those importances we just need to collect the feature_importances_ attribute from the learnt model as below.