Tuning a classifier for high precision, with no regard for recall

Question

Tuning a classifier for high precision, with no regard for recall

imageimbalanceuser

2022年5月8日 23:07

I understand this falls under the decision making aspect, rather than the probabilistic, but for the purposes of some work I am doing, I need the classifier to have very high precision, as I can't afford a false positive. I do not care about false negatives, and consequently, do not care about recall. Since it is currently a binary classifier, some might say to play with the decision probability threshold from its current 0.5 value, but I will eventually need to add a third class, and will therefore need to switch to 3 outputs with softmax. I am unaware of traditional methods for shifting my pipeline towards a high precision outcome, and am looking for ways to achieve this.

If it is any help, the problem is classification of 256x256 grayscale images in a domain that is very difficult to classify, according to current whitepapers in the computer vision research area.

Topic finetuning multiclass-classification image-classification

Category Data Science

babelproofreader · Accepted Answer · 2019年11月18日 10:08

Since, in your comment to Eugen's answer, you say your data is imbalanced you might find the focal loss function useful. From the abstract of the paper

Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.

There is code available from this github.

Eugen · Accepted Answer · 2019年6月21日 00:21

To make your learner cost sensitive, you can increase your training data by more "no" instances. If there are 10 times more "no" instance in your training data, the errors on "no's" will hurt much more and your learner will come up with a dicision scheme which will be biased that way. The variance on the "yes'es" will be lowered on the other hand (pay attention on overfitting here). After the training, you use your original data for testing and you should get good results here.

Tuning a classifier for high precision, with no regard for recall

About