Does "feature importance" depend on the model type?

Question

Does "feature importance" depend on the model type?

Frank

2020年8月24日 22:00

I was working on a small classification problem (breast cancer data set from sklearn), and trying to decide which features were most important to predict the labels. I understand that there are several ways to define important feature here (permutation importance, importance in trees ...), but I did the following: 1) rank the features by coefficient value in a logistic regression ; 2) rank the features by feature importance from a random forest. These don't quite tell the same story, and I'm thinking that a feature that might be unimportant in a linear model could be very discriminative in a non-linear model that can understand it.

Is that true in general? Or should important features (those that contribute most to a classification score) be the same across all types of models?

Topic predictor-importance feature-engineering feature-selection

Category Data Science

Multivac · Accepted Answer · 2020年8月24日 22:00

When it comes to feature importance I always go with a model-agnostic measure, as you well mention if you have two different models, they will interpret importance in different terms (Linear models as the coefficient and Tree-based models as the information gain/impurity decrease on each feature.

So you already mention one measure that does not depends on the model, rather on the metric you are interested in; Permutation importance does not care about what model you are using, but the impact that a feature has on the global performance.

This reference might give you a better idea of the advantages of using permutation importance over tree-based models importance Permutation Importance vs Random Forest Feature Importance

tehem · Accepted Answer · 2020年8月24日 15:19

Your intuition so far is correct. Feature importance does not extend across models. The feature score for an xgboost model might be irrelevant and a wrong assumption for trsining another model. There is no perfect way to define important features. It does require some prior knowledge about the data in general.

Does "feature importance" depend on the model type?

About