Positive or negative impact of features in prediction with Random Forest

In classification, when we want to get the importance of each variable in the random forest algorithm we usually use Mean Decrease in Gini or Mean Decrease in Accuracy metrics. Now is there a metric which computes the positive or negative effects of each variable not on the predictive accuracy of the model but rather on the dependent variable itself? Something like the beta coefficients in the standard linear regression model but in the context of classification with random forests.

Topic predictor-importance random-forest classification machine-learning

Category Data Science


With decision trees you cannot directly get the positive or negative effects of each variable as you would with say a linear regression through the coefficients. Its just not the way decision trees work. As you point out, the training process involves finding optimal features and splits at each node by looking at the gini index or the mutual information with the target variable. But no parameters are learnt during the process which we could use for such analysis.

A common tool that is used for this purpose is SHAP. In fact, there is a specific explainer for decision trees based models which is the TreeExplainer. With SHAP you can get both the contribution of the features in pushing towards one or another value of the label, and also an overall view of the contribution of all features.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.