binary classification pipeline to select threshold

Question

binary classification pipeline to select threshold

lml

2022年5月8日 10:03

There are quite a few questions regarding the optimisation of binary threshold in a classification problem. However, I haven't found a single end-to-end solution to this problem.

In an existing project, I have come up with the following pipeline to train a binary classifier:

Outer-CV due to small to moderate data size.
1. Inner-CV to tune hyperparameters
2. Train model with tuned hyperparameters on outer-cv trainset
3. Predict on the outer-cv test set
4. Find optimal threshold using prediction probabilities
5. Get score converting prediction probabilities to class with the optimal threshold
Report avg/std scores along with thresholds

Since there's tiny to no deviation on the score across different folds. (However, the optimal threshold stddev is 3.2)

Tune hyperparameter on entire data
Train model with tuned hyperparameters on entire data

Now my questions are:

Is this pipeline reasonable/correct? i.e., have I missed anything or parts are unnecessary?
How to get the final optimal threshold for my model when predicting in production.

Topic hyperparameter-tuning cross-validation classification

Category Data Science

Ashwiniku918 · Accepted Answer · 2022年3月31日 17:49

Pipeline seems fine , but using two CV may be very time consuming and overkill.

If you had some testing data it would be very easy to decide threshold on whatever optimises your cost function. I think one startegy can be before rolling out to customers wait for some time to generate more testing data with labels and decide the threhold which optimises model performace on those.

Brian Spiering · Accepted Answer · 2022年3月31日 00:21

One way to find a useful threshold in production is to test different possible thresholds in production. If possible, create a multi-arm bandit setup where different thresholds are evaluated on the actual data using the most relevant evaluation metric.

binary classification pipeline to select threshold

About