What is the appropriate statistical test to compare the MAUC scores from two machine learning classifiers?
I would like to compare the scores of two multi-class classifiers. I have calculated the MAUC score for each of the algorithms, and now I want to see whether there is a statistical difference between the results.
From what I have read so far, the McNemar test seems to be a good alternative, however, I am not sure how exactly to use it. In this article, there is an example o how to use McNemar's test to compare the accuracy between algorithms.
The scores I would like to compare are 0.809 and 0.812. By trying to follow the tutorial, I came up with this table on which I want to apply the McNemar test implemented here.
model 1(correct) | model 1 (wrong)
model 2 (correct) 0.809 | 0.003
|
model 2 (wrong) 0.000 | 0.191
Could someone please help me out in here? I'm very confused. Thank you!!
Topic auc difference multiclass-classification
Category Data Science