Is there a way to output feature importance based on the outputted class?

I'm running a random forest classifier in Python (two classes). I am using the feature_importances_ method of the RandomForestClassifier to get feature importances.

It provides a nice visualization of importances but it does not offer insight into which features were most important for each class. For example, it may be for class 1 that some feature values were important, whereas for class 2 some other feature values were more important.

Is it possible to split feature important based on the predicted class?

Topic explainable-ai random-forest feature-selection python

Category Data Science


You can use sklearn.inspection.permutation_importance for this purpose

Split your test data by predicted labels and pass to the method

Code example

from sklearn.inspection import permutation_importance
import matplotlib.pyplot as plt
import seaborn as sns

model = RandomForestClassifier()
model.fit(x_train, y_train)

def plot_feature_importances(model, x, y, title):
    result = permutation_importance(model, x, y, n_repeats=100, random_state=0)
    df = pd.DataFrame({'feature_name': x.columns, 'feature_importance': result.importances_mean})
    plt.figure(figsize=(8, 2))
    sns.barplot(data=df, x='feature_importance', y='feature_name')
    plt.title(title)
    plt.show()

plot_feature_importances(model, x_test, y_test, 'All test data')

y_pred = model.predict(x_test)
plot_feature_importances(model, x_test[y_pred == 1], y_test[y_pred == 1], 'Predicted as "1"')
plot_feature_importances(model, x_test[y_pred == 0], y_test[y_pred == 0], 'Predicted as "0"')

enter image description here

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.