Right way to compare model scores for Next Best Action
I have around 15 classification models for different products built in different ways (some are RF, some are Gradient Boosting, some were downsampled in one way, others in other way, some are built in 12 months historic, some are built in 24 months historic) and I have to compare the scores to choose what product to offer. All models have target 1 for customer bought the product and 0 for customer didn't buy the product.
I have read about this and found this article about calibration but I cannot really understand why I can compare the scores using calibrated scores.
Also, I found some topics about using annual conversion rate of products but linearizing it, but also didn't really understand this methodology.
Does anyone knows how to do it and why?
Thanks
Topic mathematics score scoring classification statistics
Category Data Science