What is a good objective function for allowing close to 0 predictions?
Let's say we want to predict the probability of rain. So just the binary case: rain or no rain.
In many cases it makes sense to have this in the [5%, 95%] interval. And for many applications this will be enough. And it is actually desired to make the classifier not too confident. Hence cross entropy (CE) is chosen:
$$H_{y'} (y) := - \sum_{i} y_{i}' \log (y_i)$$
But cross entropy practically makes it very hard for the classifier to learn to predict 0. Is there another objective function that does not behave that extreme around 0?
Why it matters
There might be cases where it is possible to give the prediction 0% (or at least something much closer to 0 like $10^{-6}$). Like in a desert. And there might be applications where one needs this (close to) zero predictions. For example, when you want to predict the probability that something happens at least once. If the classifier always predicts at least a 1% chance, then having rain at least once in 15 days is
$$1 - (1-0.05)^{15} \approx 54\%$$
but if the classifier can practically output 0.1% as well, then this is only
$$1 - (1-0.001)^{15} \approx 1.5\%$$
I could also imagine this to be important for medical tests or for videos.
Topic objective-function xgboost optimization
Category Data Science