binary classification pipeline to select threshold

There are quite a few questions regarding the optimisation of binary threshold in a classification problem. However, I haven't found a single end-to-end solution to this problem.

In an existing project, I have come up with the following pipeline to train a binary classifier:

  1. Outer-CV due to small to moderate data size.
    1. Inner-CV to tune hyperparameters
    2. Train model with tuned hyperparameters on outer-cv trainset
    3. Predict on the outer-cv test set
    4. Find optimal threshold using prediction probabilities
    5. Get score converting prediction probabilities to class with the optimal threshold
  2. Report avg/std scores along with thresholds

Since there's tiny to no deviation on the score across different folds. (However, the optimal threshold stddev is 3.2)

  1. Tune hyperparameter on entire data
  2. Train model with tuned hyperparameters on entire data

Now my questions are:

  1. Is this pipeline reasonable/correct? i.e., have I missed anything or parts are unnecessary?
  2. How to get the final optimal threshold for my model when predicting in production.

Topic hyperparameter-tuning cross-validation classification

Category Data Science


Pipeline seems fine , but using two CV may be very time consuming and overkill.

If you had some testing data it would be very easy to decide threshold on whatever optimises your cost function. I think one startegy can be before rolling out to customers wait for some time to generate more testing data with labels and decide the threhold which optimises model performace on those.


One way to find a useful threshold in production is to test different possible thresholds in production. If possible, create a multi-arm bandit setup where different thresholds are evaluated on the actual data using the most relevant evaluation metric.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.