Mapping values in Logistic Regression

When mapping probabilities obtained in logistic regression to 0s 1s using the sigmoid function, we use a threshold value of 0.5. If the predicted probability lies above 0.5, then it gets mapped to 1, if the predicted probability lies below 0.5, it gets mapped to 0. What if the predicted probability is exactly 0.5? What does 0.5 get mapped to?

Topic sigmoid logistic-regression

Category Data Science


(Throughout this, I will assume the classes are balanced. If that is not the case, $0.5$ is likely to be a poor threshold. (As the links in my comments describe, thresholds are even overrated.))

The good news is that this situation would be unusual, so it is unlikely to matter.

If it does come up, you have a few options. First, if the model is so uncertain about class membership, I wonder if you even have any business making a classification. The discrete decision might be to go collect more data. Second, the classifications have some kind of associated cost of misclassification. If it is bad to call a $1$ a $0$ but awful to call a $0$ a $1$, then you would be weary of classifying such a point as a $1$. You can use similar logic when it comes to how much you “profit” from each type of correct classification.

If the misclassification costs are identical and the corrwct-classification profits are identical, then I say that it doesn’t matter how you classify the point. Over the long haul, you will be wrong half the time, and both types of mistakes, which will happen equally often, incur equal costs. Likewise, you will be right half the time, with both ways you can be right happening equally often and resulting in the same profit. Your expected loss or gain from the classification will be the same no matter how you classify the point.

This can be formalized through decision theory and expected loss.


Despite the interesting comments on setting appropriate threshold values, I think the main question was about wat the threshold value actually means for the prediction.

There are different ways to implement a thresholding function. Your proposed way says that for a predicted probability p:

if p > threshold, it is predicted to be 1
AND
if p < threshold it is predicted to be 0.

This would indeed leave a gap at p == threshold, and in order to prevent this, most implementations will use a one-sided test:

If p > threshold, then it is predicted to be 1, and all other values are 0.
OR
If p < threshold, then it is predicted to be 0, and all other values are 1.

Not only is this computationally cheaper, it also prevents the aforementioned problem for p == threshold from ever occurring.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.