Low accuracy on the test set

I have a dataset with 16 features and 32 class labels, which shows the following behavior:

  1. Neural network classification: high accuracy on train 100%, but low accuracy on the test set 3% (almost like random classification). If I make the network less flexible (reduce the number of neurons or hidden layers), then the train and test accuracy become about 10%.

  2. Gradient boosting tree classification: exactly same behavior. Flexible model results in 100% accuracy over train, but random accuracy on the test. If I reduce the flexibility, the train and test accuracy will be very low around 10%.

What could be the reason? How can I fix this? Any other algorithm I can try?

Here is the distribution of the target data:

Topic gradient-boosting-decision-trees decision-trees neural-network classification machine-learning

Category Data Science


This seems like a perfect candidate of Overfitting in training data. You need to use simpler model or reduce complexity of existing model.

Also to better undertsand the problem please share number of observations for each class. i presume you dont have enough data to create a good model.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.