LimeTextExplainer for MultiClass classification - Facing issue for explain instance with custom classifier function

exp = explainer.explain_instance(df_val_final.Description[idx],predproba_list,num_features=5, top_labels=2) While executing the explain instance of LimeTextExplainer, the above statement keeps on executing continuously with the below warning message. Execution stops only if I interrupt the kernel C:\ProgramData\Anaconda3\lib\site-packages\fastai\torch_core.py:83: UserWarning: Tensor is int32: upgrading to int64; for better performance use int64 input warn('Tensor is int32: upgrading to int64; for better performance use int64 input') C:\ProgramData\Anaconda3\lib\site-packages\fastai\torch_core.py:83: UserWarning: Tensor is int32: upgrading to int64; for better performance use int64 input warn('Tensor is int32: upgrading to int64; for better performance …
Topic: lime
Category: Data Science

How to print 2-ngrams in LimeTextExplainer

I try to explain the importance of a sentence using the following pipeline with LimeTextExplainer from LIME package. Pipeline(steps=[('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', LogisticRegression())]) When I try to explain a sentence using the code below, the "importance" of the single words is shown, while I want pairs explainer.explain_instance(text, cls.predict_proba, num_features=7) exp.show_in_notebook(text=False) Is it possible to display the importance of pairs of words?
Topic: lime nlp python
Category: Data Science

Multi-valued categorical features in LIME

I am working with the LIME implementation by Marco Ribeiro (https://github.com/marcotcr/lime). Specifically, I am utilizing the LimeTabularExplainer as I have a mixture of numerical and categorical features in my dataset. How would I represent categorical features that may take on ≥ 0 values in a single example? I understand that the API requires categorical features to be converted to an integer representation, but how would I represent combinations of values for one categorical feature? To illustrate the circumstance, see the …
Category: Data Science

Lime explainer - Numpy broadcast error

I am working on a ML tutorial project with my own dataset. I built a ML model using training dataset and generated predictions using test dataset. Shape of test dataset is (418,10) My code for model training and predictions is below rfs_clf = forest_clf = RandomForestClassifier(n_estimators=110, max_depth= 8, max_features='auto', random_state=0, oob_score=False, min_samples_split = 2, criterion= 'gini', min_samples_leaf=2, bootstrap=False) rfs_clf.fit(X_train, y_train) y_f_predict = rfs_clf.predict_proba(X_test).astype(float) Now, am trying to explain the predictions using the Lime package available here explainer = lime.lime_tabular.LimeTabularExplainer(X_train.values,feature_names = …
Category: Data Science

Is it possible to to only use only a training data sample for creating a LIME model explainer?

I have been looking into outputting a model explainer artefact at time of training my Keras+Tensorflow Neural network. Lime seems like a great choice however my data is very big and I am reading from disk one batch at a time as it is impractical and inefficient to store in memory. Lime appears to require the whole training dataset to be inputted for it to be able to create a surrogate model. Is it appropriate to use only a sample …
Category: Data Science

Beginner level: how to interpret LIME and classification result

I am new to the concept of model interpretability using LIME method. I am following the tutorial LIME for spectrogram classification. I am finding hard to understand the color coding -- before using LIME the important features were already visible. After applying LIME there is no way to see the colors that highlight the important features. The last set of images below the section "Compute LIME" shows the plot of spectrograms before and after LIME -- to me they look …
Category: Data Science

Need for LIME explainer

Is it possible to train a LIME explainer for a binary classfier on a dataset without labels? I need to understand what is the value of storing a LIME explainer object trained on the same data used to train the model. In general, does it make sense to keep a trained LIME explainer around to generate explanations during production or is it better to train the LIME explainer on production data whenever is needed? Another question. If I train a …
Topic: lime training
Category: Data Science

Passing reduced/different feature data to LimeTabularExplainer compared to the original model

I am trying to use LimeTabularExplainer class and explain_instance function to find explainations of my LightGbm (lgb) model. However, the lgb model uses complex feature set which are not interpretable. I want to pass a subset of oringal features (which are interpretable) to the Lime explainer, so that my resultant explainations are also interpretable. In sections 3.1 and 3.3 of original paper, the authors talk about this https://arxiv.org/abs/1602.04938 rf = sklearn.ensemble.RandomForestClassifier(n_estimators=500) rf.fit(train, labels_train) explainer = lime.lime_tabular.LimeTabularExplainer(train, feature_names=feature_names, class_names=target_names, discretize_continuous=True) exp …
Category: Data Science

Ways to visualize the outcome of machine learning interpretability techniques (for image classification)

“Machine Learning Interpretability” or “Explainable Artificial Intelligence” has become quite popular in the machine learning community and in recent research. The goal is to make complex (deep learning) models explainable such that one can understand why the model made a particular decision. I had a look at various algorithms which do this (prominent ones like LIME, SHAP, Grad-Cam, but I've also skimmed over many papers that present very “special” approaches). Since I am working with image data, I am particularly …
Category: Data Science

Spam/ham classification

I am exploring the use of lime for spam/ham categorisation. specifically I have a data frame having list of messages. I would need to identify which messages are spam and which ones are ham by using a set of words (100). I would need to find to test the accuracy of the model. I found some articles on towardsdatasciene and medium that helped me a bit, but I would need a really small example on what I would need (already …
Category: Data Science

LIME is observing categorical features even though I am not passing any categorical features

Here is the code: predict_fn = lambda x: xgb_model.predict_proba(x).astype(float) feature_names = X_train.columns for i in range(x_val.shape[0]): # Get the explanation for Logistic Regression val_point = x_val.values[i] print(val_point) print(val_point.shape) explainer = lime.lime_tabular.LimeTabularExplainer(training_data = Xs_train_array, feature_names = feature_names, training_labels = y_train, mode = 'classification', kernel_width=5) exp = explainer.explain_instance(val_point, predict_fn, num_features=10) exp.as_pyplot_figure() plt.tight_layout() Few Notes: Xs_train_array is of size (103,) and is type float. There are no categorical variables. Here is the error message I'm receiving: --------------------------------------------------------------------------- KeyError Traceback (most recent call last) …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.