Why are SHAP values not an indication of cause?

Question

Why are SHAP values not an indication of cause?

Giampaolo Levorato

2022年3月10日 23:54

I have trained an XGBoost Classifier and I am now trying to explain how and, most importantly, why the model has made the predictions it's made.

In the documentation entry Be careful when interpreting predictive models in search of causal insights, I have read that SHAP values are indicative of correlation but not causation.

More specifically:

SHAP makes transparent the correlations picked up by predictive ML models. But making correlations transparent does not make them causal! All predictive models implicitly assume that everyone will keep behaving the same way in the future, and therefore correlation patterns will stay constant.

I don't understand clearly why I can't use SHAP values as an indication of what it would happen if the value of a certain prediction feature changed.

For example, I have this SHAP dependence plot for a predictive feature (I have removed the option of interdependence with another feature):

The relationship between the predictive feature and the binary target is not linear (as expected: after all I have trained an XGB Classifier. Also the predictive features are not independent: the dataset is a credit risk dataset, so the features can't be independent).

Why can't I say that, in this example, higher values of the feature implies higher probability (the target is binary, so the model estimates the probability to fall into category 1 of the target)?

Topic cause-and-effect shap xgboost correlation

Category Data Science

Why are SHAP values not an indication of cause?

About