Does "feature importance" depend on the model type?

I was working on a small classification problem (breast cancer data set from sklearn), and trying to decide which features were most important to predict the labels. I understand that there are several ways to define important feature here (permutation importance, importance in trees ...), but I did the following: 1) rank the features by coefficient value in a logistic regression ; 2) rank the features by feature importance from a random forest. These don't quite tell the same story, and I'm thinking that a feature that might be unimportant in a linear model could be very discriminative in a non-linear model that can understand it.

Is that true in general? Or should important features (those that contribute most to a classification score) be the same across all types of models?

Topic predictor-importance feature-engineering feature-selection

Category Data Science


When it comes to feature importance I always go with a model-agnostic measure, as you well mention if you have two different models, they will interpret importance in different terms (Linear models as the coefficient and Tree-based models as the information gain/impurity decrease on each feature.

So you already mention one measure that does not depends on the model, rather on the metric you are interested in; Permutation importance does not care about what model you are using, but the impact that a feature has on the global performance.

This reference might give you a better idea of the advantages of using permutation importance over tree-based models importance Permutation Importance vs Random Forest Feature Importance


Your intuition so far is correct. Feature importance does not extend across models. The feature score for an xgboost model might be irrelevant and a wrong assumption for trsining another model. There is no perfect way to define important features. It does require some prior knowledge about the data in general.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.