Can we use an independent t-test as a metric for feature importance?
I have a supervised binary classification problem. I tuned an xgboost model on the training set and achieved a reasonably high accuracy on the test set. Now I want to interpret the results of the model.
I used the SHAP library to interpret the results and for the most part, they are consistent with what I would expect. However, there is one feature that, if we average over all shap values, is ranked as 7th most important and I would have expected it to have been higher. If I perform a t-test of the feature between the positive group and the negative group, there is a clear statistical difference between the two (p0.05) which implies that the feature should be very predictive of the class. What could be the cause of this discrepancy?
Topic shap xgboost classification statistics
Category Data Science