Low accuracy on the test set

Question

Albert

2022年4月29日 01:07

I have a dataset with 16 features and 32 class labels, which shows the following behavior:

Neural network classification: high accuracy on train 100%, but low accuracy on the test set 3% (almost like random classification). If I make the network less flexible (reduce the number of neurons or hidden layers), then the train and test accuracy become about 10%.
Gradient boosting tree classification: exactly same behavior. Flexible model results in 100% accuracy over train, but random accuracy on the test. If I reduce the flexibility, the train and test accuracy will be very low around 10%.

What could be the reason? How can I fix this? Any other algorithm I can try?

Here is the distribution of the target data:

Ashwiniku918 · Accepted Answer · 2022年2月18日 04:26

This seems like a perfect candidate of Overfitting in training data. You need to use simpler model or reduce complexity of existing model.

Also to better undertsand the problem please share number of observations for each class. i presume you dont have enough data to create a good model.