How to export shap waterfall values to dataframe?

Question

How to export shap waterfall values to dataframe?

The Great

2022年6月1日 19:37

I am working on a binary classification using random forest model, neural networks in which am using SHAP to explain the model predictions. I followed the tutorial and wrote the below code to get the waterfall plot shown below

row_to_show = 20
data_for_prediction = ord_test_t.iloc[row_to_show]  # use 1 row of data here. Could use multiple rows if desired
data_for_prediction_array = data_for_prediction.values.reshape(1, -1)
rf_boruta.predict_proba(data_for_prediction_array)
explainer = shap.TreeExplainer(rf_boruta)
# Calculate Shap values
shap_values = explainer.shap_values(data_for_prediction)
shap.plots._waterfall.waterfall_legacy(explainer.expected_value[0], shap_values[0],ord_test_t.iloc[row_to_show])

This generated the plot as shown below

However, I want to export this to dataframe and how can I do it?

I expect my output to be like as shown below. I want to export this for the full dataframe. Can you help me please?

update - error message

---- 1 pd.DataFrame({
      2     Feature Name: [Base value] + [fFeature {i} for i in range(ord_test_t.shape[1])],
      3     Contribution: (explainer.expected_value[0]) + list(shap_values[0])
      4 })

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    634         elif isinstance(data, dict):
    635             # GH#38939 de facto copy defaults to False only in non-dict cases
-- 636             mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
    637         elif isinstance(data, ma.MaskedArray):
    638             import numpy.ma.mrecords as mrecords

~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in dict_to_mgr(data, index, columns, dtype, typ, copy)
    500         # TODO: can we get rid of the dt64tz special case above?
    501 
-- 502     return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
    503 
    504 

~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in arrays_to_mgr(arrays, columns, index, dtype, verify_integrity, typ, consolidate)
    118         # figure out the index, if necessary
    119         if index is None:
-- 120             index = _extract_index(arrays)
    121         else:
    122             index = ensure_index(index)

~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in _extract_index(data)
    672             lengths = list(set(raw_lengths))
    673             if len(lengths)  1:
-- 674                 raise ValueError(All arrays must be of the same length)
    675 
    676             if have_dicts:

ValueError: All arrays must be of the same length

Topic neural-network classification python predictive-modeling machine-learning

Category Data Science

Oxbowerce · Accepted Answer · 2022年3月31日 09:01

The values plotted are simply the SHAP values stored in shap_values, where the SHAP value at index i is the SHAP value for the feature at index i in your original dataframe. The base value you mention is then simply the expected value stored in explainer.expected_value. Therefore, you should be able to quite easily create a dataframe yourself as follows:

import pandas as pd

pd.DataFrame({
    "Feature Name": ["Base value"] + [f"Feature {i}" for i in range(data_for_prediction.shape[1])],
    "Contribution": list(explainer.expected_value[0]) + list(shap_values[0])
})

How to export shap waterfall values to dataframe?

About