How can I use a confusion matrix in image captioning?

Question

How can I use a confusion matrix in image captioning?

Lei

2022年5月18日 19:31

I read that a confusion matrix is used with image classification but if I need to draw it with image captioning how to use it or can I draw it in the evaluation model phase for example if yes how can I start?

Topic computer-vision confusion-matrix

Category Data Science

Erwan · Accepted Answer · 2022年5月18日 09:51

There's a confusion: a confusion matrix is a standard tool for evaluating a classification task, i.e. one where the target is a categorical variable. The confusion matrix is a table which allows observing the number of test instances which have true class X and are predicted class Y, for every class X and Y. This is practical only with a small number of classes of course, otherwise the confusion matrix is not readable.

The task of image captioning is not classification. The target is unstructured data (text), not a categorical variable with a finite set of possible values. Therefore it requires a different (and more complex) evaluation method. It's often similar to machine translation, based on a measure of similarity between the gold standard caption and the predicted caption. Usually one should use the state of the art evaluation method, i.e. the method used in recent papers published on this task.

How can I use a confusion matrix in image captioning?

About