Is multicollinarity a problem when interpreting SHAP values from an XGBoost model?

I'm using an XGBoost model for multi-class classification and is looking at feature importance by using SHAP values. I'm curious if multicollinarity is a problem for the interpretation of the SHAP values? As far as I know, XGB is not affected by multicollinarity, so I assume SHAP won't be affected due to that?
Category: Data Science

SHAP values interpretation for clasification

I'm trying to understand how SHAP values are calculated for Classification. As far as I understand for each feature the SHAP values are calculated by: $$ \phi_i = \sum_{S \subseteq F \setminus {i}} \frac{|S|!(|F|-|S|-1)!}{|F|!} \left[ f_{S\cup{i}} (x_{S\cup{i}})-f_S(x_S) \right] $$ For regression it makes sense that for three features ${A,B,C}$ each feature has a value. The prediction for one row might be ${A,B,C} = 50$. Then all possible coalitions are calculated with and without the feature to find the marginal contribution …
Category: Data Science

Why calculating how much removed sentences with most contributing words to the result helps to show that a model is "*faithful*"?

I don't understand how the calculation score taking out the sentences where the words contribute the most of to the result helps to show to what extent a model is "faithful" to a reasoning process. Indeed, a faithfulness score was proposed by Du et al. in 2019 to verify the importance of the identified contributing sentences or words to a given model’s outputs. It is assumed that the probability values for the predicted class will significantly drop if the truly …
Category: Data Science

Exact Shap calculations for logistic regression?

Given the relatively simple form of the model of standard logistic regression. I was wondering if there is an exact calculation of shap values for logistic regressions. To be clear I am looking for a closed formula depending on features ($X_i$) and coefficients ($\beta_i$) to calculate Shapley values and their corresponding importance.
Category: Data Science

What is the SHAP values for a liner model? How do we derive that?

What is the SHAP values for a linear model? it is given as below in the documentation Assuming features are independent leads to interventional SHAP values which for a linear model are coef[i] * (x[i] - X.mean(0)[i]) for the ith feature. Can someone explain to me how it is derived? Or direct me to a resource explaining the same?.
Category: Data Science

Explainable anomaly detection

There are plenty of working for explaining prediction in supervised learning (e.g. SHAP values, LIME). What about for anomaly detection in unsupervised learning? Is there any model for which there are libraries that can give you justifications, such as "row x is an anomaly because feature 1 is higher than 5.3 and feature 5 is equal to 'No'"?
Category: Data Science

Anomaly detection and root cause analysis

ARIMA is widely used for anomaly detection on time-series data e.g. stock price prediction. ARIMA assumes that future value of a variable (stock price in our case) is dependent on its previous values. When we do root cause analysis of a detected anomaly, there can be numerous reasons e.g. russia-ukraine war. I have 2 questions: Isn't the assumption of ARIMA invalidated because stock price is also dependent on other factors such as war Which models can I use to do …
Category: Data Science

SHAP KernelExplainer AttributeError numpy.ndarray

I've developed a text classifier of the form of python function that can input a np.array of strings (each string is one observation). def model(vector_of_strins): ... # do smthg return vec_of_probabilities # like [0.1, 0.23, ..., 0.09] When I try to use KernelExplainer from shap package like that test_texts = pd.Series(['text1','text2','text3']) shap.KernelExplainer(model, test_texts ) I receive the following error: AttributeError: 'numpy.ndarray' object has no attribute 'find' What can I do about it?
Category: Data Science

What is "Gradient × Hidden States" explainability method? Is there any documentation about it?

I am doing a literature review on post-hoc explainability methods based on gradient. I stumbled upon one I didn't heard of to extract highlights from a trained model in this post-hoc fashion: We compute gradients w.r.t. the hidden states of each layer, and multiply the resultant vectors by the hidden state vectors themselves: $\nabla_{H_i} × H_i \in R^{N+M}$, for $0 \leq i \leq L + 1$ - Marco V. Treviso et al Submission for the Explainable Quality Estimation Shared Task …
Category: Data Science

How to restructure my dataset for interpretability without losing performance?

What I am doing: I am predicting product ratings using boosted trees (XGBoost) with a dataset in this format: What I want to do: I want to use SHAP TreeExplainer to interpret each prediction my model gives in terms of product attributes and user ids. What I am getting: My model is drawing all the conclusions based on product names and user ids, instead of product attributes and user ids. What I tried: I discovered that each product name has …
Category: Data Science

How to interpret integrated gradients in an NLP toxic text classification use-case?

I am trying to understand how integrated gradients work in the NLP case. Let $F: \mathbb{R}^{n} \rightarrow[0,1]$ a function representing a neural network, $x \in \mathbb{R}^{n}$ an input and $x' \in \mathbb{R}^{n}$ a reference. We consider the segment connecting $x$ to $x'$, and we compute the gradient at any point of this segment. The IG method is simply to sum these gradients. Thus, $I G$ in the ith dimension is given by the following formula: $$ I G_{i}(x)=\left(x_{i}-x'_{i}\right) \frac{\int_{\alpha=0}^{1} d …
Category: Data Science

Using ML Interpretability Techniques for Data Analysis Instead of Strictly Model Analysis

Hope you lot are doing alright. I have been looking into Explainable AI and model interpretability lately, and I had an idea but am wondering whether it would constitute a valid use case. There is a data analysis project happening at work where we're trying to analyze data we had on hand to determine the factors that affect our KPOs and possibly derive useful actionable insights. Instead of moving forward with manually evaluating correlation and doing EDA that way, I …
Category: Data Science

What is the meaning of an empty SHAP graph in Explainable AI?

Using Python, I created a neural network to perform predictions on a binary class dataset (e.g. will a passenger survive the Titanic?). I am using the SHAP package to explain individual predictions. For all of the instances in this dataset, the visualization outputted by SHAP has an output value of 0 and the higher/lower graph is empty (there are no features listed). shap.force_plot(k_explainer.expected_value[0], k_shap_values[0], label_test_X.iloc[0]) When I use a different dataset and run the line above, SHAP outputs a graph …
Category: Data Science

Why does SHAP's TreeExplainer with "interventional" method not match exact SHAP?

I am trying to understand the concepts/definitions behind the SHAP method of explaining model predictions. In particular I've read the original SHAP paper and the TreeExplainer paper. The original paper lays out a particular, well-defined set of values for a given model prediction on tabular data, which can be computed exactly (although this is very slow in practice, so the paper/package gives various other algorithms as "approximations".) In the TreeExplainer paper, algorithm 1 & 2 for "TreeExplainer with path-dependent feature …
Category: Data Science

Is there a way to output feature importance based on the outputted class?

I'm running a random forest classifier in Python (two classes). I am using the feature_importances_ method of the RandomForestClassifier to get feature importances. It provides a nice visualization of importances but it does not offer insight into which features were most important for each class. For example, it may be for class 1 that some feature values were important, whereas for class 2 some other feature values were more important. Is it possible to split feature important based on the …
Category: Data Science

What have my models learnt?

I am doing a time series classification task. I used LSTM, Bi-LSTM. Bi-LSTM works a little bit better than single layer LSTM. And concatenating two Bi-LSTM outputs with another input gives me a better result. But after all, what have my models leant? I actually don't think there are any patterns in this time series. How does LSTM give the outputs from these irregular data? Why this model works better than the other? Is it pure luck?
Category: Data Science

Multi-valued categorical features in LIME

I am working with the LIME implementation by Marco Ribeiro (https://github.com/marcotcr/lime). Specifically, I am utilizing the LimeTabularExplainer as I have a mixture of numerical and categorical features in my dataset. How would I represent categorical features that may take on ≥ 0 values in a single example? I understand that the API requires categorical features to be converted to an integer representation, but how would I represent combinations of values for one categorical feature? To illustrate the circumstance, see the …
Category: Data Science

Which AI algorithm is best for chess?

I'm working on my chess bot, and I would like to implement simple artificial intelligence for it. I'm new in it, so I'm unsure how to do it specifically on chess. I heard about Q-learning, Supervised/Unsupervised learning, Genetic algorithm, etc., which probably is not for chess. I wondered how AlphaZero was created? Probably Genetic algorithm, but chess is the game where "if A then B" might not work. It means that Q-learning is also bad for it, and so on. …
Category: Data Science

How to stop a text-classification model from depending on only couple of the words from input text instead of entire sentence?

I have a text classification deep-learning model, which takes in a text and outputs a softmax probability. I am using glove embeddings to represent my input text in numerical form for the DL model. the DL model is actually quite simple too. the embedding layer is trainable and no weights has been passed to it. And after the training, while I was making predictions with unseen text, I could realise that only one of the key-words had huge importance in …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.