Evaluation Metric for Imbalanced and Ordinal Classification
I'm looking for an ML evaluation metric that would work well with imbalanced and ordinal multiclass datasets:
Imagine you want to predict the severity of a disease that has 4 grades of severity where 1 is mild and 4 represent the worse outcome. Now, this dataset would realistically have the vast majority of patients in the mild zone (classes 1 or 2) and fewer in classes 3 and 4. (Imbalanced/skewed dataset).
Now in the example, a classifier that predicts a grade 4 as grade 1, should be penalised more than a classifier that predicts a grade 4 as grade 3 etc. (Ordinal class).
If I use MCC, Cohen's K etc. I will be able to account for the imbalance in the dataset but not for the ordinal nature of its class. Would you know if there is a metric that would account for both or if there is a way to modify/combine metrics so that both aspects of the dataset would be taken into account? (If possible using Python but also other languages or a mathematical explanation would work)