shap - Geeks Mental

Interpretation of SHAP summary plot in a multi class context

hideonbush

2022年5月26日 14:54

I'm performing multi-class classification and uses SHAP values to interpret the features. I have 3 classes. I have testet XGBoost and Multinomial Logistic Regression. When i'm using XGBoost I am able to get a summary plot where I can see the individual feature affect on all three classes. I'm also able to get a seperate plot for each class to see how small/large feature values affect the prediction towards the individual class. It seems like this is only possible to …

Topic: shap xgboost python

Category: Data Science

Is multicollinarity a problem when interpreting SHAP values from an XGBoost model?

hideonbush

2022年5月25日 11:56

I'm using an XGBoost model for multi-class classification and is looking at feature importance by using SHAP values. I'm curious if multicollinarity is a problem for the interpretation of the SHAP values? As far as I know, XGB is not affected by multicollinarity, so I assume SHAP won't be affected due to that?

Topic: shap explainable-ai xgboost machine-learning

Category: Data Science

SHAP values interpretation for clasification

Plewis

2022年5月22日 15:07

I'm trying to understand how SHAP values are calculated for Classification. As far as I understand for each feature the SHAP values are calculated by: $$ \phi_i = \sum_{S \subseteq F \setminus {i}} \frac{|S|!(|F|-|S|-1)!}{|F|!} \left[ f_{S\cup{i}} (x_{S\cup{i}})-f_S(x_S) \right] $$ For regression it makes sense that for three features ${A,B,C}$ each feature has a value. The prediction for one row might be ${A,B,C} = 50$. Then all possible coalitions are calculated with and without the feature to find the marginal contribution …

Topic: shap explainable-ai machine-learning

Category: Data Science

Exact Shap calculations for logistic regression?

lcrmorin

2022年5月18日 09:15

Given the relatively simple form of the model of standard logistic regression. I was wondering if there is an exact calculation of shap values for logistic regressions. To be clear I am looking for a closed formula depending on features ($X_i$) and coefficients ($\beta_i$) to calculate Shapley values and their corresponding importance.

Topic: shap explainable-ai logistic-regression

Category: Data Science

Can we use an independent t-test as a metric for feature importance?

Evan

2022年5月17日 15:08

I have a supervised binary classification problem. I tuned an xgboost model on the training set and achieved a reasonably high accuracy on the test set. Now I want to interpret the results of the model. I used the SHAP library to interpret the results and for the most part, they are consistent with what I would expect. However, there is one feature that, if we average over all shap values, is ranked as 7th most important and I would …

Topic: shap xgboost classification statistics

Category: Data Science

What is the SHAP values for a liner model? How do we derive that?

NAS

2022年5月2日 15:31

What is the SHAP values for a linear model? it is given as below in the documentation Assuming features are independent leads to interventional SHAP values which for a linear model are coef[i] * (x[i] - X.mean(0)[i]) for the ith feature. Can someone explain to me how it is derived? Or direct me to a resource explaining the same?.

Topic: linear-models shap explainable-ai machine-learning

Category: Data Science

SHAP KernelExplainer AttributeError numpy.ndarray

student

2022年4月14日 07:42

I've developed a text classifier of the form of python function that can input a np.array of strings (each string is one observation). def model(vector_of_strins): ... # do smthg return vec_of_probabilities # like [0.1, 0.23, ..., 0.09] When I try to use KernelExplainer from shap package like that test_texts = pd.Series(['text1','text2','text3']) shap.KernelExplainer(model, test_texts ) I receive the following error: AttributeError: 'numpy.ndarray' object has no attribute 'find' What can I do about it?

Topic: shap explainable-ai predictor-importance nlp python

Category: Data Science

references on how to use shap values without the shap package

Phillip Maire

2022年4月5日 00:15

I am familiar with the shap python package and how to use it, I also have a pretty good idea about shap values in general, but it is still new to me. What I'm requesting are references (ideally python custom code in blog posts) to explain how to take an array of raw shap values (of shape num_features X num_samples) and get... feature importance interaction terms any other calculations the shap package does My motivation for this is that I …

Topic: gradient-boosting-decision-trees shap data-science-model python

Category: Data Science

Does shap value use the target variable in its calculation?

Phillip Maire

2022年4月4日 23:44

I saw this answer mentioning that shap uses the target variable but I can get shap values without the target variable using the shap values, for example explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(x_val) where model is my loaded model which does not contain target data and x_val is a matrix of features X samples` which does not contain my target data. I know shap is an approximation of shapley values, does this have something to do with it?

Topic: shap python

Category: Data Science

Sample size for SHAP explainer and range of a SHAP value

The Great

2022年3月25日 15:23

I am working on a binary classification with 977 records with 77:23 class proportion. I used random forest model. Based on my attempt to run SHAP package, I got the below plots And I also see that SHAP requires us to select sample size to get the SHAP value as shown here in this post When SHAP does not use same assumption as LIME neighborhood, why does it require sample size to be mentioned? To summarize, my questions are as …

Topic: shap random-forest classification predictive-modeling machine-learning

Category: Data Science

How to interpret SHAP summary plot?

The Great

2022年3月19日 19:27

I already referred these posts here and here. So, please don't mark it as duplicate I am doing a binary classification using random forest and class labels are 1 and 0. What is the likelihood that supplier will meet the target I got the below output from SHAP summary plot How do I know which feature leads to class 1 and class 0? Does it mean high values of each feature leads to class 1? And low values of each …

Topic: shap random-forest classification predictive-modeling machine-learning

Category: Data Science

Are predictive features with 0 SHAP values included in the model?

Giampaolo Levorato

2022年3月13日 14:34

I have trained and XGBoost by enforcing no-feaure interaction and calculated Global Shap values: It looks like only 6 features have some SHAP values, whilst the remaining ones have a SHAP value of 0. Question. If a feature has a SHAP value of 0 across all records in the sample, does it mean that that feature has not been included in the model?

Topic: shap xgboost correlation

Category: Data Science

Why are SHAP values not an indication of cause?

Giampaolo Levorato

2022年3月10日 23:54

I have trained an XGBoost Classifier and I am now trying to explain how and, most importantly, why the model has made the predictions it's made. In the documentation entry Be careful when interpreting predictive models in search of causal insights, I have read that SHAP values are indicative of correlation but not causation. More specifically: SHAP makes transparent the correlations picked up by predictive ML models. But making correlations transparent does not make them causal! All predictive models implicitly …

Topic: cause-and-effect shap xgboost correlation

Category: Data Science

BorutaShap implementation

spectre

2022年3月10日 22:07

I want to use BorutaShap for feature selection in my model. I have my train_x as an numpy.ndarray and I want to pass it to the BorutaShap instance. When I try to fit I am getting error as: AttributeError: 'numpy.ndarray' object has no attribute 'columns' Below is my code:- num_trans = Pipeline(steps = [('impute', SimpleImputer(strategy = 'mean')), ('scale', StandardScaler())]) cat_trans = Pipeline(steps = [('impute', SimpleImputer(strategy = 'most_frequent')), ('encode', OneHotEncoder(handle_unknown = 'ignore'))]) from sklearn.compose import ColumnTransformer preproc = ColumnTransformer(transformers = [('cat', …

Topic: boruta shap feature-selection python

Category: Data Science

Do monotonic constraints prevent an XGboost to capture non-linear relationships in the data?

Giampaolo Levorato

2022年3月8日 10:34

I have trained an XGBoost model (for a binary classification problem) and I have tested two scenarios: Scenario 1 - No Monotonic Constrained applied In this case I get a Gini on the training sample of 81.1 and a Gini on the Validation sample of 76.5, throwing a red flag for overfitting. I have taken a look at the SHAP dependence plot for one char in Scenario 1 and it looks like this: Scenario 2 - Monotonic Constrained applied In …

Topic: shap overfitting xgboost

Category: Data Science

Differences between Feature Importance and SHAP variable importance graph

Giampaolo Levorato

2022年3月5日 16:10

I have run an XGBClassifier using the following fields: - predictive features = ['sbp','tobacco','ldl','adiposity','obesity','alcohol','age'] - binary target = 'Target' I have produced the following Features Importance plot: I understand that, generally speaking, importance provides a score that indicates how useful or valuable each feature was in the construction of the boosted decision trees within the model. The more an attribute is used to make key decisions with decision trees, the higher its relative importance. From the list of 7 predictive …

Topic: feature-importances shap xgboost

Category: Data Science

Why does SHAP's TreeExplainer with "interventional" method not match exact SHAP?

asafr

2022年3月5日 01:40

I am trying to understand the concepts/definitions behind the SHAP method of explaining model predictions. In particular I've read the original SHAP paper and the TreeExplainer paper. The original paper lays out a particular, well-defined set of values for a given model prediction on tabular data, which can be computed exactly (although this is very slow in practice, so the paper/package gives various other algorithms as "approximations".) In the TreeExplainer paper, algorithm 1 & 2 for "TreeExplainer with path-dependent feature …

Topic: shap interpretation explainable-ai machine-learning

Category: Data Science

aggregation of feature importance

dean

2022年2月18日 12:01

I have more of a conceptual question I was hoping to get some feedback on. I am trying to run a boosted regression ML model to identify a subset of important predictors for some clinical condition. The dataset includes over 100000 rows, and close to 1000 predictors. Now, the etiology of the disease we are trying to predict is largely unknown. Thus, we likely don’t have data on many important predictors for the condition. That is to say, as a …

Topic: feature-importances shap xgboost gbm cross-validation

Category: Data Science

LSTM Shapley Deep Explainer TimeseriesGenerator Keras

DomIsAwesomee

2022年2月10日 15:34

I have this data in the form: X_train shape: (2724, 10) , y_train shape: (2724,) X_test shape: (682, 10) , y_test shape: (682,) which I feed into Keras' TimeseriesGenerator: window_length = 63 batch_size = 32 train_generator = TimeseriesGenerator(X_train, y_train, length=window_length, sampling_rate=1, batch_size=batch_size, stride=1) test_generator = TimeseriesGenerator(X_test, y_test, length=window_length, sampling_rate=1, batch_size=batch_size, stride=1) This is the type of train_generator: type(train_generator): <class 'tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator'> Each sequence looks like this: for i in range(len(train_generator)): x, y = train_generator[i] print(x.shape, y.shape) (32, 63, 10) (32,) (32, …

Topic: shap keras deep-learning time-series python

Category: Data Science

How are two linear models with features f1 and (C-f1) similar or different?

NAS

2022年1月17日 09:12

I am training a linear model. I'm planning to update this model every month. I have two perfectly correlated features such that f1+f2=C, where C is a constant. Since I cannot include both, I will be including just f1. I have a dashboard where I am showcasing the results. If I want to see the effect of both features, What should I do?.or How do I interpret the result of f2 given the coefficient and feature importance of f1. I'm …

Topic: shap machine-learning-model linear-regression

Category: Data Science

About