Having trouble scaling scores of logistic regression

Question

Having trouble scaling scores of logistic regression

Ach113

2021年2月16日 22:01

I am constructing a credit scorecard using logistic regression, similar to the one shown here. However, when trying to convert the coefficients of logistic regression into score representation (by scaling the values using the provided formula) I am getting numbers that dont make much sense.

Formula used for calculating scores:

Score_i= (βi × WoE_i + α/n) × Factor + Offset/n

where βi is the coefficient of the logistic regression (of variable i),
WoE_i is the weight of evidence of corresponding variable,
α is the intercept of the logistic regression,
Factor is calculated as PDO / ln(2),
Offset is calculated as target_points - (factor * ln(target_odds))
n is number of variables used in regression

In my case PDO = 50, target_odds = 2, target_points = 500, n = 81, and intercept is -0.12686514.

Here is an example of one of my features:

As it can be seen, both WoE and the coefficients increase in value as the revenue variable increases. The score however does not act in this way. Initially I just assumed the scores acted in inverse proportion to those two values so I added a negative sign in front of the formula:

Score_i= -(βi × WoE_i + α/n) × Factor + Offset/n

But for some other features, the score is proportional to the other values:

And adding the negative sign would now make this feature have a score that does not make much sense.

How can I keep the scaling consistent? What am I doing incorrectly here?

Topic scoring logistic-regression

Category Data Science

Ach113 · Accepted Answer · 2021年1月17日 21:10

Turns out I was using a wrong formula for scaling the scores. The formula

Score_i= (βi × WoE_i + α/n) × Factor + Offset/n

is applicable only if the logistic regression is done after doing WoE Transform, meaning for all features (not for all bins!) a WoE value is assigned and then the classifier is fit.

The way I was doing it was binning first, and feeding columns with dummy values into the classifier.

Reference to correct approach

Having trouble scaling scores of logistic regression

About