Reasons for a model predicting probability of being class 1 at x value

All,

This is a general question. I have a binary classification which predicts if someone is rich or not. I had a question from someone asking that if the probability someone is rich is 0.6 and another person is also given this probability are the reasons for WHY they are rich the same?

I am using an xgboost and my instinct is to say no. e.g. if i were to profile each population = 0.5, = 0.6,... etc would i find differences in their features? I would say it's hard because there's no linear relationship between outcome and target most of time, it can be complex.

In general i guess my question is: if two people are given same probabiity of being class 1 - will the models reasons for giving each of these people this 0.6 be the same? 'reasons' being features/feature values

Topic gradient-boosting-decision-trees xgboost decision-trees classification

Category Data Science


Not necessarily, while it can be the case that two observations belong to the same 'group' and end up in the same leaf node (and thus get the same predicted value) there can also be multiple groups of observations that both have the same predicted value. If this is the case in your example is of course dependant on the data you are using. It would indeed be a time intensive task to manually check why certain observations have a certain predicted value there are an increasing number of methods/python packages to help explain a model's decision with the increased focus on interpretable/explainable machine learning models. An example of such a method is using Shapley values, which is implemented in the shap python package and can be applied easily on tree ensembles (see the example on the linked github page).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.