explainable-ai

Is multicollinarity a problem when interpreting SHAP values from an XGBoost model?

hideonbush

2022年5月25日 11:56

I'm using an XGBoost model for multi-class classification and is looking at feature importance by using SHAP values. I'm curious if multicollinarity is a problem for the interpretation of the SHAP values? As far as I know, XGB is not affected by multicollinarity, so I assume SHAP won't be affected due to that?

Topic: shap explainable-ai xgboost machine-learning

Category: Data Science

SHAP values interpretation for clasification

Plewis

2022年5月22日 15:07

I'm trying to understand how SHAP values are calculated for Classification. As far as I understand for each feature the SHAP values are calculated by: $$ \phi_i = \sum_{S \subseteq F \setminus {i}} \frac{|S|!(|F|-|S|-1)!}{|F|!} \left[ f_{S\cup{i}} (x_{S\cup{i}})-f_S(x_S) \right] $$ For regression it makes sense that for three features ${A,B,C}$ each feature has a value. The prediction for one row might be ${A,B,C} = 50$. Then all possible coalitions are calculated with and without the feature to find the marginal contribution …

Topic: shap explainable-ai machine-learning

Category: Data Science

Why calculating how much removed sentences with most contributing words to the result helps to show that a model is "faithful"?

Revolucion for Monica

2022年5月18日 19:28

I don't understand how the calculation score taking out the sentences where the words contribute the most of to the result helps to show to what extent a model is "faithful" to a reasoning process. Indeed, a faithfulness score was proposed by Du et al. in 2019 to verify the importance of the identified contributing sentences or words to a given model’s outputs. It is assumed that the probability values for the predicted class will significantly drop if the truly …

Topic: explainable-ai metric nlp

Category: Data Science

Exact Shap calculations for logistic regression?

lcrmorin

2022年5月18日 09:15

Given the relatively simple form of the model of standard logistic regression. I was wondering if there is an exact calculation of shap values for logistic regressions. To be clear I am looking for a closed formula depending on features ($X_i$) and coefficients ($\beta_i$) to calculate Shapley values and their corresponding importance.

Topic: shap explainable-ai logistic-regression

Category: Data Science

What is the SHAP values for a liner model? How do we derive that?

NAS

2022年5月2日 15:31

What is the SHAP values for a linear model? it is given as below in the documentation Assuming features are independent leads to interventional SHAP values which for a linear model are coef[i] * (x[i] - X.mean(0)[i]) for the ith feature. Can someone explain to me how it is derived? Or direct me to a resource explaining the same?.

Topic: linear-models shap explainable-ai machine-learning

Category: Data Science

Explainable anomaly detection

EuRBamarth

2022年4月22日 06:36

There are plenty of working for explaining prediction in supervised learning (e.g. SHAP values, LIME). What about for anomaly detection in unsupervised learning? Is there any model for which there are libraries that can give you justifications, such as "row x is an anomaly because feature 1 is higher than 5.3 and feature 5 is equal to 'No'"?

Topic: explainable-ai anomaly-detection outlier

Category: Data Science

Anomaly detection and root cause analysis

learnlifelong

2022年4月17日 20:43

ARIMA is widely used for anomaly detection on time-series data e.g. stock price prediction. ARIMA assumes that future value of a variable (stock price in our case) is dependent on its previous values. When we do root cause analysis of a detected anomaly, there can be numerous reasons e.g. russia-ukraine war. I have 2 questions: Isn't the assumption of ARIMA invalidated because stock price is also dependent on other factors such as war Which models can I use to do …

Topic: explainable-ai isolation-forest anomaly-detection arima time-series

Category: Data Science

SHAP KernelExplainer AttributeError numpy.ndarray

student

2022年4月14日 07:42

I've developed a text classifier of the form of python function that can input a np.array of strings (each string is one observation). def model(vector_of_strins): ... # do smthg return vec_of_probabilities # like [0.1, 0.23, ..., 0.09] When I try to use KernelExplainer from shap package like that test_texts = pd.Series(['text1','text2','text3']) shap.KernelExplainer(model, test_texts ) I receive the following error: AttributeError: 'numpy.ndarray' object has no attribute 'find' What can I do about it?

Topic: shap explainable-ai predictor-importance nlp python

Category: Data Science

What is "Gradient × Hidden States" explainability method? Is there any documentation about it?

Revolucion for Monica

2022年4月11日 07:30

I am doing a literature review on post-hoc explainability methods based on gradient. I stumbled upon one I didn't heard of to extract highlights from a trained model in this post-hoc fashion: We compute gradients w.r.t. the hidden states of each layer, and multiply the resultant vectors by the hidden state vectors themselves: $\nabla_{H_i} × H_i \in R^{N+M}$, for $0 \leq i \leq L + 1$ - Marco V. Treviso et al Submission for the Explainable Quality Estimation Shared Task …

Topic: explainable-ai gradient-descent

Category: Data Science

How to restructure my dataset for interpretability without losing performance?

Ruan Putka

2022年4月10日 12:02

What I am doing: I am predicting product ratings using boosted trees (XGBoost) with a dataset in this format: What I want to do: I want to use SHAP TreeExplainer to interpret each prediction my model gives in terms of product attributes and user ids. What I am getting: My model is drawing all the conclusions based on product names and user ids, instead of product attributes and user ids. What I tried: I discovered that each product name has …

Topic: explainable-ai xgboost predictive-modeling machine-learning

Category: Data Science

How to interpret integrated gradients in an NLP toxic text classification use-case?

Revolucion for Monica

2022年4月9日 14:30

I am trying to understand how integrated gradients work in the NLP case. Let $F: \mathbb{R}^{n} \rightarrow[0,1]$ a function representing a neural network, $x \in \mathbb{R}^{n}$ an input and $x' \in \mathbb{R}^{n}$ a reference. We consider the segment connecting $x$ to $x'$, and we compute the gradient at any point of this segment. The IG method is simply to sum these gradients. Thus, $I G$ in the ith dimension is given by the following formula: $$ I G_{i}(x)=\left(x_{i}-x'_{i}\right) \frac{\int_{\alpha=0}^{1} d …

Topic: explainable-ai gradient gradient-descent neural-network nlp

Category: Data Science

Using ML Interpretability Techniques for Data Analysis Instead of Strictly Model Analysis

Mohammed Rashid

2022年3月21日 06:11

Hope you lot are doing alright. I have been looking into Explainable AI and model interpretability lately, and I had an idea but am wondering whether it would constitute a valid use case. There is a data analysis project happening at work where we're trying to analyze data we had on hand to determine the factors that affect our KPOs and possibly derive useful actionable insights. Instead of moving forward with manually evaluating correlation and doing EDA that way, I …

Topic: interpretation explainable-ai data-analysis machine-learning

Category: Data Science

What is the meaning of an empty SHAP graph in Explainable AI?

caspar

2022年3月19日 06:04

Using Python, I created a neural network to perform predictions on a binary class dataset (e.g. will a passenger survive the Titanic?). I am using the SHAP package to explain individual predictions. For all of the instances in this dataset, the visualization outputted by SHAP has an output value of 0 and the higher/lower graph is empty (there are no features listed). shap.force_plot(k_explainer.expected_value[0], k_shap_values[0], label_test_X.iloc[0]) When I use a different dataset and run the line above, SHAP outputs a graph …

Topic: explainable-ai neural-network machine-learning

Category: Data Science

Why does SHAP's TreeExplainer with "interventional" method not match exact SHAP?

asafr

2022年3月5日 01:40

I am trying to understand the concepts/definitions behind the SHAP method of explaining model predictions. In particular I've read the original SHAP paper and the TreeExplainer paper. The original paper lays out a particular, well-defined set of values for a given model prediction on tabular data, which can be computed exactly (although this is very slow in practice, so the paper/package gives various other algorithms as "approximations".) In the TreeExplainer paper, algorithm 1 & 2 for "TreeExplainer with path-dependent feature …

Topic: shap interpretation explainable-ai machine-learning

Category: Data Science

Is there a way to output feature importance based on the outputted class?

Erik M

2022年1月29日 20:53

I'm running a random forest classifier in Python (two classes). I am using the feature_importances_ method of the RandomForestClassifier to get feature importances. It provides a nice visualization of importances but it does not offer insight into which features were most important for each class. For example, it may be for class 1 that some feature values were important, whereas for class 2 some other feature values were more important. Is it possible to split feature important based on the …

Topic: explainable-ai random-forest feature-selection python

Category: Data Science

What have my models learnt?

user900476

2022年1月26日 19:10

I am doing a time series classification task. I used LSTM, Bi-LSTM. Bi-LSTM works a little bit better than single layer LSTM. And concatenating two Bi-LSTM outputs with another input gives me a better result. But after all, what have my models leant? I actually don't think there are any patterns in this time series. How does LSTM give the outputs from these irregular data? Why this model works better than the other? Is it pure luck?

Topic: explainable-ai lstm classification time-series

Category: Data Science

Multi-valued categorical features in LIME

BVB

2022年1月15日 15:04

I am working with the LIME implementation by Marco Ribeiro (https://github.com/marcotcr/lime). Specifically, I am utilizing the LimeTabularExplainer as I have a mixture of numerical and categorical features in my dataset. How would I represent categorical features that may take on ≥ 0 values in a single example? I understand that the API requires categorical features to be converted to an integer representation, but how would I represent combinations of values for one categorical feature? To illustrate the circumstance, see the …

Topic: categorical-encoding lime explainable-ai

Category: Data Science

Which AI algorithm is best for chess?

Jenia

2021年12月3日 00:58

I'm working on my chess bot, and I would like to implement simple artificial intelligence for it. I'm new in it, so I'm unsure how to do it specifically on chess. I heard about Q-learning, Supervised/Unsupervised learning, Genetic algorithm, etc., which probably is not for chess. I wondered how AlphaZero was created? Probably Genetic algorithm, but chess is the game where "if A then B" might not work. It means that Q-learning is also bad for it, and so on. …

Topic: explainable-ai deepmind game deep-learning machine-learning

Category: Data Science

What is the difference between shap kernel-explainer and deep-explainer

user3668129

2021年11月12日 19:37

I want to use shap to explain my image classification model. I read that it is better to use shap.DeepExplainer (than shap.KernelExplainer) when using deep-learning models. What makes DeepExplainer works better in deep models than KernelExplainer ? What are the benefits of using DeepExplainer and not KernelExplainer ?

Topic: shap explainable-ai deep-learning

Category: Data Science

How to stop a text-classification model from depending on only couple of the words from input text instead of entire sentence?

Naveen Reddy Marthala

2021年11月3日 17:09

I have a text classification deep-learning model, which takes in a text and outputs a softmax probability. I am using glove embeddings to represent my input text in numerical form for the DL model. the DL model is actually quite simple too. the embedding layer is trainable and no weights has been passed to it. And after the training, while I was making predictions with unseen text, I could realise that only one of the key-words had huge importance in …

Topic: explainable-ai text-classification deep-learning text-mining machine-learning

Category: Data Science

About