What is the proper way to bin variables for calculating WoE during credit scoring?

Question

What is the proper way to bin variables for calculating WoE during credit scoring?

Ach113

2020年12月23日 23:20

I have read this article about developing a credit scorecard in python, where it is stated that when binning the continuous variables, it needs to be ensured that:

1. Each bin should have at least 5% of the observations
2. Each bin should be non-zero for both good and bad loans
3. The WOE should be distinct for each category. Similar groups should be aggregated or binned together. It is because the bins with similar WoE have almost the same proportion of good or bad loans, implying the same predictive power
4. The WOE should be monotonic, i.e., either growing or decreasing with the bins
5. Missing values are binned separately

This seems like a lot of work to accomplish manually (each column needs to be divided in bins, each of these five conditions need to be checked manually, bins should then be adjusted and the conditions need to be checked again). Is there a faster way to do it? Or is there any algorithm/function that bins continuous variables in most practical way.

Topic scoring

Category Data Science

Erwan · Accepted Answer · 2020年12月23日 23:20

Or is there any algorithm/function that bins continuous variables in most practical way.

Sure there is, but that's the wrong question: the standard method for discretizing a continuous variable consists in splitting the values into equal intervals, that's it. Of course it doesn't guarantee any of the 5 conditions, since these conditions are about additional constraints almost exclusively related to expert knowledge and the specifics of the data.

Note that these conditions can certainly be automated, there's no need for manual verification. There might be some domain-specific packages which do this for you, but there's no reason a standard ML/statistics library would provide methods for every specific problem like this one.

What is the proper way to bin variables for calculating WoE during credit scoring?

About