How to ouput buckets of probabilities?
I am dealing with an unbalanced binary classification problem. The problem is so unbalanced (2:98) and hard to predict that I am interested in probability of the positive outcome instead of trying to predict the actual binary output. Depending on the model used this require either to calibrate the model in probabilities (tree based models) or transforming scores into probabilities using some spline (NN).
But in the end, for all practical matters I use buckets of probabilities. With 2% being the average I use predefined buckets like these : [0,0.5%],[0.5%,1%],[1%,2%],[2%,5%],[5%,10%],[10%,25%],[25%,50%],[50%,100%]. To attribute a bucket to an instance, the simple way to go is to just put the instance in the bucket that match the predicted probability. I am aware this might not be optimal (removing information on the output, not using the optimal buckets ... etc.) but let's say those buckets are an expert-based constraint (or enforced by law in some cases).
I have some doubts on the necessity to go trough individual probability to predict a bucket. I also have the feeling that directly predicting buckets instead of probabilities could help remove a bit of the model instability I am observing (basically different random initialisation for NN might yield very different individual output - I am usually painfully dealing with this trough regularisation).
To get a general solution I was trying to find some loss function that would match this ordinal/multi-class/calibrated in probabilities problem, but couldn't find anything. So I am asking you: is there a general approach to predict given buckets of probabilities ? (Feel free to give model specific approaches instead / or give an explanation on why this might be a bad idea.)