How to compute f1_score for multiclass multilabel classification

I have used one hot encoder [1,0,0][0,1,0][0,0,1] for my functional classification model. The predicted probabilities for test data yprob = model.predict(testX) gives me :

yprob = array([[0.18120882, 0.5803128 , 0.22847839],
       [0.0101245 , 0.12861261, 0.9612609 ],
       [0.16332535, 0.4925239 , 0.35415074],
       ...,
       [0.9931931 , 0.09328955, 0.01351734],
       [0.48841736, 0.25034943, 0.16123319],
       [0.3807928, 0.42698202, 0.27493873]], dtype=float32)

I would like to compute the Accuracy, F1 score and the confusion matrix from this.

The sequential api offers a predict_classes function to do it.

yclasses = model.predict_classes(testX) and using the f1_score function of sklearn we could compute all those values.

How could I apply it to predict probabilities of test data for multiclass multilabel classification ?

My second question is to know if the highest value of each array of yprob = model.predict(testX) corresponds to the predicted class ? for example, [0.18120882, 0.5803128 , 0.22847839] is the first element in the array. The highest value is 0.5803128. Does it mean that it corresponds to the one hot encoder [0, 1, 0], so the second label because it is the second element in the array ?

Topic f1score multiclass-classification deep-learning accuracy scikit-learn

Category Data Science


There seems to be a confusion between multiclass and multilabel classification:

  • Multiclass is the regular case where the task consists in predicting among N possible classes. For example an image can be either a dog or a horse or a cat, but always exactly one among these three animals.
  • Multilabel is the when the task consists in predicting a set. For example an image can be any subset of {dog, horse, cat}: it could be {dog, cat}, it could be {horse}, it could be {dog, horse, cat}, it could even be the empty set (no animal at all).

Practically in the multilabel case you predict each possible animal independently as a binary problem, so for every image the system answers 3 questions:

  • does this image contain a dog? (y/n)
  • does this image contain a horse? (y/n)
  • does this image contain a cat? (y/n)

Because each question is predicted independently, it doesn't make sense to pick the class which has the max probability. In effect there are 3 independent binary classification problems and 3 corresponding confusion matrices.

Apparently you didn't mean to take into account the multilabel case and you don't have any image labelled with multiple animals, right? If so you should change the system to solve a regular multiclass problem.

It looks like the confusion might be due to the one-hot-encoding of the class: maybe you thought that the class being categorical, it would be a mistake to encode the class as a numerical value. This is true for categorical features but actually not for the target, you can perfectly use for instance LabelEncoder to represent the class as a single target. This is much simpler and probably more appropriate for your problem. One difference you will notice is that the probabilities obtained as predictions sum to one, because the classifier doesn't consider the classes as independent (as opposed to what happens in your current experiment: the probs don't sum to 1).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.