Comparing Multiclass classifiers with "No Answer"-Class

I have three classifiers to classify some words into four classes. Every word that does not fit into any of these four classes gets classified as No Answer. I would like to compare the classifiers with Precision, Recall, and F1-Score. Do I have to ignore the No Answer class to calculate the average Precision and so on or is it important to include it?

Topic multiclass-classification named-entity-recognition evaluation classification machine-learning

Category Data Science


Precision, recall and F1 score are defined only for the binary case (2 classes), so if you want to apply that to the multiclass case, you need to apply a trick. A typical trick is to average the recall per class: Per class, you calculate which fraction of the words actually in that class are correctly classified. balanced_accuracy_score() in scikit-learn does that for you automatically.

Recall does not take into account false positives, so if you have words that where the model should say "no answer" (i.e. there are words where the ground truth is "no class"), then you should take that class into account. Otherwise it would benefit the model to just take a shot for all of the words it's given, and not classify any as "no class".

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.