Interpretation of SHAP summary plot in a multi class context

I'm performing multi-class classification and uses SHAP values to interpret the features. I have 3 classes. I have testet XGBoost and Multinomial Logistic Regression. When i'm using XGBoost I am able to get a summary plot where I can see the individual feature affect on all three classes. I'm also able to get a seperate plot for each class to see how small/large feature values affect the prediction towards the individual class. It seems like this is only possible to …
Category: Data Science

Is multicollinarity a problem when interpreting SHAP values from an XGBoost model?

I'm using an XGBoost model for multi-class classification and is looking at feature importance by using SHAP values. I'm curious if multicollinarity is a problem for the interpretation of the SHAP values? As far as I know, XGB is not affected by multicollinarity, so I assume SHAP won't be affected due to that?
Category: Data Science

SHAP values interpretation for clasification

I'm trying to understand how SHAP values are calculated for Classification. As far as I understand for each feature the SHAP values are calculated by: $$ \phi_i = \sum_{S \subseteq F \setminus {i}} \frac{|S|!(|F|-|S|-1)!}{|F|!} \left[ f_{S\cup{i}} (x_{S\cup{i}})-f_S(x_S) \right] $$ For regression it makes sense that for three features ${A,B,C}$ each feature has a value. The prediction for one row might be ${A,B,C} = 50$. Then all possible coalitions are calculated with and without the feature to find the marginal contribution …
Category: Data Science

Exact Shap calculations for logistic regression?

Given the relatively simple form of the model of standard logistic regression. I was wondering if there is an exact calculation of shap values for logistic regressions. To be clear I am looking for a closed formula depending on features ($X_i$) and coefficients ($\beta_i$) to calculate Shapley values and their corresponding importance.
Category: Data Science

Can we use an independent t-test as a metric for feature importance?

I have a supervised binary classification problem. I tuned an xgboost model on the training set and achieved a reasonably high accuracy on the test set. Now I want to interpret the results of the model. I used the SHAP library to interpret the results and for the most part, they are consistent with what I would expect. However, there is one feature that, if we average over all shap values, is ranked as 7th most important and I would …
Category: Data Science

What is the SHAP values for a liner model? How do we derive that?

What is the SHAP values for a linear model? it is given as below in the documentation Assuming features are independent leads to interventional SHAP values which for a linear model are coef[i] * (x[i] - X.mean(0)[i]) for the ith feature. Can someone explain to me how it is derived? Or direct me to a resource explaining the same?.
Category: Data Science

SHAP KernelExplainer AttributeError numpy.ndarray

I've developed a text classifier of the form of python function that can input a np.array of strings (each string is one observation). def model(vector_of_strins): ... # do smthg return vec_of_probabilities # like [0.1, 0.23, ..., 0.09] When I try to use KernelExplainer from shap package like that test_texts = pd.Series(['text1','text2','text3']) shap.KernelExplainer(model, test_texts ) I receive the following error: AttributeError: 'numpy.ndarray' object has no attribute 'find' What can I do about it?
Category: Data Science

references on how to use shap values without the shap package

I am familiar with the shap python package and how to use it, I also have a pretty good idea about shap values in general, but it is still new to me. What I'm requesting are references (ideally python custom code in blog posts) to explain how to take an array of raw shap values (of shape num_features X num_samples) and get... feature importance interaction terms any other calculations the shap package does My motivation for this is that I …
Category: Data Science

Does shap value use the target variable in its calculation?

I saw this answer mentioning that shap uses the target variable but I can get shap values without the target variable using the shap values, for example explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(x_val) where model is my loaded model which does not contain target data and x_val is a matrix of features X samples` which does not contain my target data. I know shap is an approximation of shapley values, does this have something to do with it?
Topic: shap python
Category: Data Science

Sample size for SHAP explainer and range of a SHAP value

I am working on a binary classification with 977 records with 77:23 class proportion. I used random forest model. Based on my attempt to run SHAP package, I got the below plots And I also see that SHAP requires us to select sample size to get the SHAP value as shown here in this post When SHAP does not use same assumption as LIME neighborhood, why does it require sample size to be mentioned? To summarize, my questions are as …
Category: Data Science

How to interpret SHAP summary plot?

I already referred these posts here and here. So, please don't mark it as duplicate I am doing a binary classification using random forest and class labels are 1 and 0. What is the likelihood that supplier will meet the target I got the below output from SHAP summary plot How do I know which feature leads to class 1 and class 0? Does it mean high values of each feature leads to class 1? And low values of each …
Category: Data Science

Are predictive features with 0 SHAP values included in the model?

I have trained and XGBoost by enforcing no-feaure interaction and calculated Global Shap values: It looks like only 6 features have some SHAP values, whilst the remaining ones have a SHAP value of 0. Question. If a feature has a SHAP value of 0 across all records in the sample, does it mean that that feature has not been included in the model?
Category: Data Science

Why are SHAP values not an indication of cause?

I have trained an XGBoost Classifier and I am now trying to explain how and, most importantly, why the model has made the predictions it's made. In the documentation entry Be careful when interpreting predictive models in search of causal insights, I have read that SHAP values are indicative of correlation but not causation. More specifically: SHAP makes transparent the correlations picked up by predictive ML models. But making correlations transparent does not make them causal! All predictive models implicitly …
Category: Data Science

BorutaShap implementation

I want to use BorutaShap for feature selection in my model. I have my train_x as an numpy.ndarray and I want to pass it to the BorutaShap instance. When I try to fit I am getting error as: AttributeError: 'numpy.ndarray' object has no attribute 'columns' Below is my code:- num_trans = Pipeline(steps = [('impute', SimpleImputer(strategy = 'mean')), ('scale', StandardScaler())]) cat_trans = Pipeline(steps = [('impute', SimpleImputer(strategy = 'most_frequent')), ('encode', OneHotEncoder(handle_unknown = 'ignore'))]) from sklearn.compose import ColumnTransformer preproc = ColumnTransformer(transformers = [('cat', …
Category: Data Science

Do monotonic constraints prevent an XGboost to capture non-linear relationships in the data?

I have trained an XGBoost model (for a binary classification problem) and I have tested two scenarios: Scenario 1 - No Monotonic Constrained applied In this case I get a Gini on the training sample of 81.1 and a Gini on the Validation sample of 76.5, throwing a red flag for overfitting. I have taken a look at the SHAP dependence plot for one char in Scenario 1 and it looks like this: Scenario 2 - Monotonic Constrained applied In …
Category: Data Science

Differences between Feature Importance and SHAP variable importance graph

I have run an XGBClassifier using the following fields: - predictive features = ['sbp','tobacco','ldl','adiposity','obesity','alcohol','age'] - binary target = 'Target' I have produced the following Features Importance plot: I understand that, generally speaking, importance provides a score that indicates how useful or valuable each feature was in the construction of the boosted decision trees within the model. The more an attribute is used to make key decisions with decision trees, the higher its relative importance. From the list of 7 predictive …
Category: Data Science

Why does SHAP's TreeExplainer with "interventional" method not match exact SHAP?

I am trying to understand the concepts/definitions behind the SHAP method of explaining model predictions. In particular I've read the original SHAP paper and the TreeExplainer paper. The original paper lays out a particular, well-defined set of values for a given model prediction on tabular data, which can be computed exactly (although this is very slow in practice, so the paper/package gives various other algorithms as "approximations".) In the TreeExplainer paper, algorithm 1 & 2 for "TreeExplainer with path-dependent feature …
Category: Data Science

aggregation of feature importance

I have more of a conceptual question I was hoping to get some feedback on. I am trying to run a boosted regression ML model to identify a subset of important predictors for some clinical condition. The dataset includes over 100000 rows, and close to 1000 predictors. Now, the etiology of the disease we are trying to predict is largely unknown. Thus, we likely don’t have data on many important predictors for the condition. That is to say, as a …
Category: Data Science

LSTM Shapley Deep Explainer TimeseriesGenerator Keras

I have this data in the form: X_train shape: (2724, 10) , y_train shape: (2724,) X_test shape: (682, 10) , y_test shape: (682,) which I feed into Keras' TimeseriesGenerator: window_length = 63 batch_size = 32 train_generator = TimeseriesGenerator(X_train, y_train, length=window_length, sampling_rate=1, batch_size=batch_size, stride=1) test_generator = TimeseriesGenerator(X_test, y_test, length=window_length, sampling_rate=1, batch_size=batch_size, stride=1) This is the type of train_generator: type(train_generator): <class 'tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator'> Each sequence looks like this: for i in range(len(train_generator)): x, y = train_generator[i] print(x.shape, y.shape) (32, 63, 10) (32,) (32, …
Category: Data Science

How are two linear models with features f1 and (C-f1) similar or different?

I am training a linear model. I'm planning to update this model every month. I have two perfectly correlated features such that f1+f2=C, where C is a constant. Since I cannot include both, I will be including just f1. I have a dashboard where I am showcasing the results. If I want to see the effect of both features, What should I do?.or How do I interpret the result of f2 given the coefficient and feature importance of f1. I'm …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.