What is a good objective function for allowing close to 0 predictions?

Let's say we want to predict the probability of rain. So just the binary case: rain or no rain.

In many cases it makes sense to have this in the [5%, 95%] interval. And for many applications this will be enough. And it is actually desired to make the classifier not too confident. Hence cross entropy (CE) is chosen:

$$H_{y'} (y) := - \sum_{i} y_{i}' \log (y_i)$$

But cross entropy practically makes it very hard for the classifier to learn to predict 0. Is there another objective function that does not behave that extreme around 0?

Why it matters

There might be cases where it is possible to give the prediction 0% (or at least something much closer to 0 like $10^{-6}$). Like in a desert. And there might be applications where one needs this (close to) zero predictions. For example, when you want to predict the probability that something happens at least once. If the classifier always predicts at least a 1% chance, then having rain at least once in 15 days is

$$1 - (1-0.05)^{15} \approx 54\%$$

but if the classifier can practically output 0.1% as well, then this is only

$$1 - (1-0.001)^{15} \approx 1.5\%$$

I could also imagine this to be important for medical tests or for videos.

Topic objective-function xgboost optimization

Category Data Science


Please be aware that neural networks are normally badly calibrated. Essentially this means that for a binary classification, neural networks are good at keeping the prediction score of a sample in the right area (above 50% or below 50%, depending on the class), but the actual values do not have to represent plausible probabilites of the real world.

There is a lot of research going on about how to calibrate neural networks, e.g. this approach.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.