Conventional way of representing uncertainty

I am calculating metrics such as F1 score, Recall, Precision and Accuracy in multilabel classification setting. With random initiliazed weights the softmax output (i.e. prediction) might look like this with a batch size of 8 (I am using pytorch):

import torch
logits = torch.tensor([[ 0.0334, -0.0896, -0.0832, -0.0682, -0.0707],
                       [ 0.0322, -0.0897, -0.0829, -0.0683, -0.0708],
                       [ 0.0324, -0.0894, -0.0829, -0.0682, -0.0705],
                       [ 0.0322, -0.0897, -0.0828, -0.0683, -0.0708],
                       [ 0.0333, -0.0895, -0.0832, -0.0682, -0.0708],
                       [ 0.0341, -0.0871, -0.0829, -0.0681, -0.0650],
                       [ 0.0329, -0.0894, -0.0832, -0.0678, -0.0716],
                       [ 0.0324, -0.0897, -0.0830, -0.0683, -0.0708]])

y_pred_label1 = logits[:,:3].softmax(1)
y_pred_label2 = logits[:,3:].softmax(1)

With the correct labels (one-hot encoded):

y_true = torch.tensor([[0, 0, 1, 0, 1],
                       [0, 1, 0, 0, 1],
                       [0, 1, 0, 0, 1],
                       [0, 0, 1, 1, 0],
                       [0, 0, 1, 1, 0],
                       [1, 0, 0, 0, 1], # this row is correctly predicted
                       [0, 1, 0, 1, 0],
                       [0, 0, 1, 0, 1]])

I can calculate the metrics by taking the argmax (index of max value) of the corresponding row dimension:

from torchmetrics.functional import f1_score

y_pred_label1 = y_pred_label1.argmax(1) # [0, 0, 0, 0, 0, 0, 0, 0]
y_pred_label2 = y_pred_label2.argmax(1) # [0, 0, 0, 0, 0, 1, 0, 0]

y_true_label1 = y_true[:,:3].argmax(1) #  [2, 1, 1, 2, 2, 0, 1, 2]
y_true_label2 = y_true[:, 3:].argmax(1) # [1, 1, 1, 0, 0, 1, 0, 1]

f1_label1 = f1_score(y_pred_label1, y_true_label1, num_classes=3)
f1_label2 = f1_score(y_pred_label2, y_true_label2, num_classes=2)

f1_label1, f1_label2

Output:

(tensor(0.1250), tensor(0.5000))

The first prediction happens to be correct while the rest are wrong. However, none of the predictive probabilities in label1 are above 0.5, which means that the model is generally uncertain about the predictions. What is the common way of encoding this uncertainty? I would like the f1 score to be 0.0 because none of the predictive probabilities are above a 0.5 threshold.

An idea I had was set these values manually by using some dummy label outside the target range, but there might be a better way to think about this.

Since this type of operation seems to lack documentation in both the sklearn and torchmetrics library, I am not sure if this is common practice?

Topic uncertainty pytorch multilabel-classification neural-network

Category Data Science


Sorry, it's not really an answer to the question asked but I think that there's a serious problem in your approach:

  • You're applying softmax on the vector of predictions across labels
  • You also pick the argmax for both the predicted and true labels.

These two things make sense for the multiclass setting, but they are not consistent with the multi-label setting: in the multi-label setting, the labels and their probabilities are supposed to be independent of each other. Softmax results in a vector of probabilities which sum to 1, representing the chances of each class to be the unique correct one (this is correct for the multiclass setting only).

The argmax on a multi-label true vector doesn't make any sense either: if the instance has more than one label (as intended), I'm not sure what argmax does (pick one of the labels randomly?), but it certainly returns a single label, ignoring any other one.

The evaluation of multi-label classification must be done for every label independently, i.e. you should obtain a value of precision, recall, f1-score for every label. Then you can calculate macro- or micro- precision, recall, f1-score across the labels.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.