Is multicollinarity a problem when interpreting SHAP values from an XGBoost model?

I'm using an XGBoost model for multi-class classification and is looking at feature importance by using SHAP values. I'm curious if multicollinarity is a problem for the interpretation of the SHAP values? As far as I know, XGB is not affected by multicollinarity, so I assume SHAP won't be affected due to that?

Topic shap explainable-ai xgboost machine-learning

Category Data Science


Shapley values are designed to deal with this problem. You might want to have a look at the literature.

They are based on the idea of a collaborative game, and the goal is to compute each player's contribution to the total game.

Lets say you are playing in the football champions league final Real Madrid vs Liverpool. And Madrid only has 3 players A,B,C and they somehow score 5 goals.

To calculate the Shapley value of player 1, you will have the following combinations playing, combinations:

$S_1 = \frac{1}{3}\left( v(\{1,2,3\} - v(\{2,3\})\right) + \frac{1}{6}\left( v(\{1,2\} - v(\{2\})\right) + \frac{1}{6}\left( v(\{1,3\} - v(\{3\})\right)+ \frac{1}{3}\left( v(\{1\} - v(\emptyset)\right)$

$S_2 = \frac{1}{3}\left( v({1,2,3} - v({1,2})\right) + \frac{1}{6}\left( v({1,2} - v({1})\right) + \frac{1}{6}\left( v({2,3} - v({3})\right)+ \frac{1}{3}\left( v({2} - v(\emptyset)\right)$$

$S_3 = \frac{1}{3}\left( v(\{1,2,3\} - v(\{1,2\})\right) + \frac{1}{6}\left( v(\{1,3\} - v(\{1\})\right) + \frac{1}{6}\left( v(\{2,3\} - v(\{2\})\right)+ \frac{1}{3}\left( v(\{3\} - v(\emptyset)\right)$

Where v = value of the function of the set. For the Real Madrid the numbers of goals scored, by the different combinations of players.

As you see, the theoretical definition encapsulates the dependence between features. The theory will tell you that the sum of the contributions is equal to the prediction $S_1 + S_2 + S_3 = 5$.

Let's now if RM players gets some high Shapley values next week.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.