What is the proper way to bin variables for calculating WoE during credit scoring?
I have read this article about developing a credit scorecard in python, where it is stated that when binning the continuous variables, it needs to be ensured that:
1. Each bin should have at least 5% of the observations
2. Each bin should be non-zero for both good and bad loans
3. The WOE should be distinct for each category. Similar groups should be aggregated or binned together. It is because the bins with similar WoE have almost the same proportion of good or bad loans, implying the same predictive power
4. The WOE should be monotonic, i.e., either growing or decreasing with the bins
5. Missing values are binned separately
This seems like a lot of work to accomplish manually (each column needs to be divided in bins, each of these five conditions need to be checked manually, bins should then be adjusted and the conditions need to be checked again). Is there a faster way to do it? Or is there any algorithm/function that bins continuous variables in most practical way.
Topic scoring
Category Data Science