How to check for "statistical significance" of categorical feature in black box models

Let's say we have a categorical feature $X_i$ and we have build a black-box classification model like xgboost with $X_i$ as one of many predictors. We'd like to ask a question: does $X_i$ affects the overall prediction and, if so, how much?

In particular $X_i$ could be:

  • a dichotomous variable
  • a n-level variable where we are interested in the potential difference between two particular levels

In white-box models like linear regression we have tests to obtain statistical significance. But can we obtain statistical-significance-alike with black box models? Does any tool from explainable artifficial intelligence field is applicable to that? Or would it be better to just perform standard t-test on the output probabilities predictions?

Topic predictor-importance xgboost machine-learning

Category Data Science


First, you have to encode the feature. Models only take numerical features.

Then assessing the solution:

  • You can either see the feature importance of the model

  • Or use an XAI tool that will help you to understand the predictions. I normally use SHAP (SHapley Additive exPlanations):is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions.

XAI example

https://github.com/slundberg/shap

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.