checking model stability - Performance for different class
I tried to do multi-class classification problem. The goal is to predict whether the match will be won by HomeTeam, AwayTeam or Draw. I did feature engineering from the attributes and finally came up with final data to train a classifier. I make sure that the data is balanced for all the 3 class.
To train a classifier I did XGB Classifier, Logistic Regression, SGD Classifier and Normal DNN(Tensorflow Estimator). I checked the metrics for all the classifiers and I am picking out the best one from the classifier.
Linear SGD Classifier Performance on Validation Set
Class, Precision, Recall, spe, f1, geo, iba, sup
A 0.58 0.69 0.79 0.63 0.74 0.54 275
D 0.51 0.61 0.66 0.55 0.63 0.40 338
H 0.81 0.50 0.94 0.62 0.69 0.45 315
Avg/mean 0.63 0.60 0.79 0.60 0.68 0.46 928
Model Performance for Test Dataset
pre rec spe f1 geo iba sup
A 0.87 0.55 0.97 0.67 0.73 0.51 84
D 0.43 0.69 0.66 0.53 0.67 0.45 83
H 0.80 0.69 0.86 0.74 0.77 0.58 139
We can see this model is stable over the class A and H but the precision is so poor for class D. I think because of a lack of feature the model is not performing well for class D. Though, I did several EDA and Feature Engineering to increase the recall for class D.
My question is, Is this model is considered stable?