Does high accuracy metrics with small (but equally sampled) dataset means a good model?

I have been training my CNN with 200 images per class for a classification problem. There problem is a binary classification one. And with the amount of test data ( 25 per class) I am getting good accuracy, precision and recall values. Does that mean my model is actually good?

Topic cnn image-classification cross-validation neural-network

Category Data Science


You could read some papers about problems with small dataset like this one https://arxiv.org/pdf/1611.03199.pdf:

Recent work has demonstrated that standard machine-learning techniques such as random forests and simple deep-networks are capable of learning meaningful chemical information from only a few hundred compounds

Although this example isn't about images (I recommend you to look over medical problems with images and cnn), as you can find, such challenges are wide spread in different fields, where it's difficult to get sufficient amount of labeled data (medical problems for instance). The idea is that it's possible to create appropriate model and judge about the quality of it's performance. And if the target field of the further usage of your algorithm has the same data representation, it's quite possible that your model is good enough.


You can do a crossvalidation to be sure your testing set is not just very easy to classify.

If it is possible, you could try to augment the size of your training set by doing some rotation, shift, flip ... If you are using Keras, you can read this blog.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.