Excluding data via confidence score: Is it a good idea?

Question

Excluding data via confidence score: Is it a good idea?

Amirhossein Rezaei

2021年9月11日 09:32

Let's say I have a model which has a binary classification task (Two classes of 0 and 1) and therefore, it outputs a number between 0 and 1, if it is greater than 0.5 we consider it to be class 1 and 0 the other way around.

Now let's say we remove any results in the test set that its output is between two thresholds of 0.4 and 0.6 to make the model more confident. To be more clear, if the output is in that bound, the model just prints I'm not confident about this image.

Is this approach a good idea in general?

What if the task is about a binary classification of a medical dataset like COVID?

And if so, has this approach used in any recent research?

Topic binary-classification confidence deep-learning

Category Data Science

Erwan · Accepted Answer · 2021年9月11日 09:32

In general yes, the predicted probability can be used in this way. However it's important to take into account that this probability is a prediction itself, i.e. the model could be wrong about it. For example the model may predict a probability of 99% positive for an instance which is actually negative. As usual, it cannot be assumed that the model is correct: it has to be evaluated, in particular whether the instances tagged as "not confident" are actually more likely to be wrongly predicted or not.

An important question in this strategy is how to select the bounds of the "not confident" interval, for example arbitrarily choosing [0.4,0.6] may not be optimal.

Excluding data via confidence score: Is it a good idea?

About