Relating ROC curves with class statistics
I have three neural net models that I am running on the same dataset (of 7 classes) and calculate their class performance and also ROC curves. The firs tmodel is a 4-layer model with 8 neurons in each layer, the second one is a 3-layer network of 32 nodes each, and the last one is a two-layer network of 64 nodes each. The class statistics and ROC curves for each network is shown below:
4x8 Network:
precision recall f1-score support
0.0 0.999 1.000 1.000 86582
1.0 0.688 0.494 0.575 1732
2.0 0.490 0.266 0.345 267
3.0 0.929 0.955 0.942 8878
4.0 0.000 0.000 0.000 70
5.0 0.155 0.726 0.256 117
6.0 0.740 0.520 0.611 148
accuracy 0.983 97794
macro avg 0.572 0.566 0.533 97794
weighted avg 0.984 0.983 0.983 97794
3x32 Network:
precision recall f1-score support
0.0 0.999 1.000 0.999 86582
1.0 0.690 0.622 0.654 1732
2.0 0.547 0.367 0.439 267
3.0 0.929 0.960 0.944 8878
4.0 0.000 0.000 0.000 70
5.0 0.330 0.325 0.328 117
6.0 0.667 0.338 0.448 148
accuracy 0.985 97794
macro avg 0.595 0.516 0.545 97794
weighted avg 0.984 0.985 0.984 97794
2x64 Network:
precision recall f1-score support
0.0 0.999 1.000 0.999 86582
1.0 0.689 0.641 0.664 1732
2.0 0.411 0.139 0.207 267
3.0 0.932 0.957 0.944 8878
4.0 0.000 0.000 0.000 70
5.0 0.241 0.453 0.315 117
6.0 0.800 0.378 0.514 148
accuracy 0.985 97794
macro avg 0.582 0.510 0.520 97794
weighted avg 0.984 0.985 0.984 97794
Looking at the ROC graphs I conclude that 2x64 network is superior in all classes compared to the other two, but from the tables and considering F1 statistics, I prefer 3x32 network as it has better performance in most of classes. The AUC statsitics is almost always 1 for all classes except class 4 in network 3x32, which doesn't make sense to me considering the high range of precision and recall values that I get (also the class 4 has zero precision and recall in all models). In short, I find F1 statistics much more clear than ROC and I can not relate these two concepts together but I think there should be a unified explanation.
Topic classification performance
Category Data Science