How to Use Shap Kernal Explainer with Pipeline models?

Question

How to Use Shap Kernal Explainer with Pipeline models?

Nayana Madhu

2021年6月1日 17:16

I have a pandas DataFrame X. I would like to find the prediction explanation of a a particular model.

My model is given below:

pipeline = Pipeline(steps= [
        ('imputer', imputer_function()),
        ('classifier', RandomForestClassifier()
    ])
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)
y_pred = pipeline.fit(x_train, y_train).predict(x_test)

Now for prediction explainer, I use Kernal Explainer from Shap.

This is the following:

# use Kernel SHAP to explain test set predictions
shap.initjs()

explainer = shap.KernelExplainer(pipeline.predict_proba, x_train, link="logit")

shap_values = explainer.shap_values(x_test, nsamples=10)

# # plot the SHAP values for the Setosa output of the first instance
shap.force_plot(explainer.expected_value[0], shap_values[0][0,:], x_test.iloc[0,:], link="logit")

When I run the code, I get the error:

ValueError: Specifying the columns using strings is only supported for pandas DataFrames.

Provided model function fails when applied to the provided data set.

ValueError: Specifying the columns using strings is only supported for pandas DataFrames

Can anyone please help me? I'm really stuck with this. Both x_train and x_test are pandas data frames.

Topic data-science-model machine-learning-model ipython machine-learning

Category Data Science

ntnq · Accepted Answer · 2021年6月1日 17:16

I've tried to create a function as suggested but it doesn't work for my code. However, as suggested from an example on Kaggle, I found the below solution:

import shap

#load JS vis in the notebook
shap.initjs() 

#set the tree explainer as the model of the pipeline
explainer = shap.TreeExplainer(pipeline['classifier'])

#apply the preprocessing to x_test
observations = pipeline['imputer'].transform(x_test)

#get Shap values from preprocessed data
shap_values = explainer.shap_values(observations)

#plot the feature importance
shap.summary_plot(shap_values, x_test, plot_type="bar")

Nayana Madhu · Accepted Answer · 2019年5月26日 10:08

The reason is kernel shap sends data as numpy array which has no column names. so we need to fix it as follows:

def model_predict(data_asarray):
    data_asframe =  pd.DataFrame(data_asarray, columns=feature_names)
    return estimator.predict(data_asframe)

Then,

shap_kernel_explainer = shap.KernelExplainer(model_predict, x_train, link='logit')
shap_values_single = shap_kernel_explainer.shap_values(x_test.iloc[0,:])
shap.force_plot(shap_kernel_explainer.expected_value[0],np.array(shap_values_single[0]), x_test.iloc[0,:],link='logit')

How to Use Shap Kernal Explainer with Pipeline models?

About