How can i tell if my model is overfitting from the distribution of predicted probabilities?
all,
i am training light gradient boosting and have used all of the necessary parameters to help in over fitting.i plot the predicted probabilities (i..e probabililty has cancer) distribution from the model (after calibrating using calibrated classifier) i.e. their histogram or kde. as you can see from below the probabilities for my class 1 are concentrated on the upper and lower end.
i have tried playing around with bandwith too to smooth this a little and it doesn't smooth the bumps too much. what do you think this shows about my model? isn't it a good thing that the model for class 1 (which is has cancer) is assigning a greater probability for this class?
i am unsure how to interpret this or where i could be going wrong
the red curve is positive class (has cancer) and the blue curve is hasn't. below is plot used to generate.
results = df[['label','predicted_prob']]
colors = ['b', 'r']
for label in [0, 1]:
results[results['label'] == label]['predicted_prob'].plot.kde(bw_method=0.35,color=colors[label])
plt.xlim(0,1)
Topic lightgbm probability-calibration probability classification python
Category Data Science