LIME is observing categorical features even though I am not passing any categorical features

Here is the code:

predict_fn = lambda x: xgb_model.predict_proba(x).astype(float)

feature_names = X_train.columns

for i in range(x_val.shape[0]):

    # Get the explanation for Logistic Regression
    val_point = x_val.values[i]
    print(val_point)
    print(val_point.shape)

    explainer = lime.lime_tabular.LimeTabularExplainer(training_data = Xs_train_array,
                                                      feature_names = feature_names, 
                                                      training_labels = y_train,
                                                      mode = 'classification',
                                                      kernel_width=5)

    exp = explainer.explain_instance(val_point, predict_fn, num_features=10)
    exp.as_pyplot_figure()
    plt.tight_layout()

Few Notes:

  1. Xs_train_array is of size (103,) and is type float.
  2. There are no categorical variables.

Here is the error message I'm receiving:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
ipython-input-24-ee5cbeeb11ea in module
     19     #print("Validation Prediction Probabilities: {}".format(xgb_model.predict_proba(val_point)))
     20 
--- 21     exp = explainer.explain_instance(val_point, predict_fn, num_features=10)
     22     exp.as_pyplot_figure()
     23     plt.tight_layout()

~/opt/anaconda3/lib/python3.7/site-packages/lime/lime_tabular.py in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
    335             # Preventative code: if sparse, convert to csr format if not in csr format already
    336             data_row = data_row.tocsr()
-- 337         data, inverse = self.__data_inverse(data_row, num_samples)
    338         if sp.sparse.issparse(data):
    339             # Note in sparse case we don't subtract mean since data would become dense

~/opt/anaconda3/lib/python3.7/site-packages/lime/lime_tabular.py in __data_inverse(self, data_row, num_samples)
    534         inverse = data.copy()
    535         for column in categorical_features:
-- 536             values = self.feature_values[column]
    537             freqs = self.feature_frequencies[column]
    538             inverse_column = self.random_state.choice(values, size=num_samples,

KeyError: 87

This confuses me for a couple reasons:

  1. categorical_features was not passed by me so I don't know where it is getting 87 from

Any help would be much appreciated. I used the same exact code on another dataset and am not running into any errors. I can't quite figure out what is causing this.

Topic lime xgboost python

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.