Conventional way of representing uncertainty
I am calculating metrics such as F1 score, Recall, Precision and Accuracy in multilabel classification setting. With random initiliazed weights the softmax output (i.e. prediction) might look like this with a batch size of 8 (I am using pytorch
):
import torch
logits = torch.tensor([[ 0.0334, -0.0896, -0.0832, -0.0682, -0.0707],
[ 0.0322, -0.0897, -0.0829, -0.0683, -0.0708],
[ 0.0324, -0.0894, -0.0829, -0.0682, -0.0705],
[ 0.0322, -0.0897, -0.0828, -0.0683, -0.0708],
[ 0.0333, -0.0895, -0.0832, -0.0682, -0.0708],
[ 0.0341, -0.0871, -0.0829, -0.0681, -0.0650],
[ 0.0329, -0.0894, -0.0832, -0.0678, -0.0716],
[ 0.0324, -0.0897, -0.0830, -0.0683, -0.0708]])
y_pred_label1 = logits[:,:3].softmax(1)
y_pred_label2 = logits[:,3:].softmax(1)
With the correct labels (one-hot encoded):
y_true = torch.tensor([[0, 0, 1, 0, 1],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 1],
[0, 0, 1, 1, 0],
[0, 0, 1, 1, 0],
[1, 0, 0, 0, 1], # this row is correctly predicted
[0, 1, 0, 1, 0],
[0, 0, 1, 0, 1]])
I can calculate the metrics by taking the argmax (index of max value) of the corresponding row dimension:
from torchmetrics.functional import f1_score
y_pred_label1 = y_pred_label1.argmax(1) # [0, 0, 0, 0, 0, 0, 0, 0]
y_pred_label2 = y_pred_label2.argmax(1) # [0, 0, 0, 0, 0, 1, 0, 0]
y_true_label1 = y_true[:,:3].argmax(1) # [2, 1, 1, 2, 2, 0, 1, 2]
y_true_label2 = y_true[:, 3:].argmax(1) # [1, 1, 1, 0, 0, 1, 0, 1]
f1_label1 = f1_score(y_pred_label1, y_true_label1, num_classes=3)
f1_label2 = f1_score(y_pred_label2, y_true_label2, num_classes=2)
f1_label1, f1_label2
Output:
(tensor(0.1250), tensor(0.5000))
The first prediction happens to be correct while the rest are wrong. However, none of the predictive probabilities in label1 are above 0.5, which means that the model is generally uncertain about the predictions. What is the common way of encoding this uncertainty? I would like the f1 score to be 0.0 because none of the predictive probabilities are above a 0.5 threshold.
An idea I had was set these values manually by using some dummy label outside the target range, but there might be a better way to think about this.
Since this type of operation seems to lack documentation in both the sklearn
and torchmetrics
library, I am not sure if this is common practice?
Topic uncertainty pytorch multilabel-classification neural-network
Category Data Science