Impose similar metric on segments to model

Question

Impose similar metric on segments to model

David Masip

2022年3月23日 17:04

I am training a binary classifier in a dataset using AUC as a score. The dataset has two main groups (we will refer to them as good and bad population). A property that this dataset has is having a higher proportion of target = 1 in the bad population.

For this reason, a relatively dummy classifier would give higher scores to the bad population and lower scores to the good population. In fact, the AUC of the classifier could be pretty high globally, and, when looking at the AUC inside both populations separately, the AUC might be really low in both of them.

I want to avoid this behavior. In fact, I am willing to sacrifice some AUC in the global population such that the AUC in each group is not very low. An idea that I had was using the harmonic mean of the AUC of both groups as a metric instead of the general AUC. However, this might not really help a classifier in a natural way.

Are there any papers/techniques/software that can help me in solving this problem in a more natural way?

Topic binary classification machine-learning

Category Data Science

noe · Accepted Answer · 2019年10月14日 13:39

Given that in your data there is correlation between population type (good vs. bad) and target, your model may learn undesirable associations between both. Therefore, the population type is a confounding factor.

A natural tool to cope with scenarios with confounders it causal inference. You can find an overview of causal inference in Judea Pearl's work, either this article or his book. A terser introduction to causal inference can be found in Ferenc Huszár's blog including an entry for controlling confounders.

There are a few python packages providing causal inference functionality, such a Microsoft's dowhy or Causalinference.

Impose similar metric on segments to model

About