How to use confidence labels?

I have 2 sets of training data in csv files. The training data have class labels, 1 for memorable, and 0 for not memorable. In addition, there is also a confidence label for each sample. The class labels were assigned based on decisions from 3 people viewing the photos. When they all agreed, the class label could be considered certain, and a confidence of 1 was written down. If they didn't all agree, then the classification decided on by the majority was assigned, but with a confidence of only 0.66.

There is one file of test data, containing 2000 samples. my task is to obtain predictions for the class labels of these.

I have managed to obtain the predictions but only by getting rid of the confidence labels column. However, I feel like my classifier would be more accurate if I use the confidence labels somehow.

How can I use these confidence labels? What are they? What am I supposed to do with them?

Also is there was a way to add weight to the more important data then we could keep it and not delete it?

Topic binary-classification confidence labels dataset python

Category Data Science


Just a few ideas that can be done easily with these confidence scores:

  • Note that with only two possibilities of 1 and 0.66, these confidence scores are practically discrete. Thus you could design the problem as 3-classes, with the instances scored 0.66 as a class 'probable'.
  • Simply remove the instances which have a confidence less than 1. It might improve performance. because these instances are more likely to contain errors an/or be ambiguous.
  • Design the problem as a regression task where the goal is to predict the score. This way the model might be able to capture the continuous values of confidence, maybe better than using classification probabilities.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.