CNN for subsets of a dataset - how to tune hyperparameters

I have a dataset and would like to train CNNs on subsets of different size of the dataset. I already have a CNN, which classifies very well if I use the entire dataset. Now the question arises if I should really try to additionally optimize the parameters of the CNN for the subsets, regardless of whether I do Data Augmentation or not? Does it really make sense if I try to change the CNN model for the subsets by using RandomizedSearchCV or GridSearchCV to optimize the number of convolutional layers, different learning rates, etc....?

In other words, suppose I found the perfect CNN model for a dataset. Is this model also the perfect model for subsets of this dataset?

I hope someone can give me a hint. For any help, I thank you in advance.

Topic hyperparameter-tuning cnn gridsearchcv accuracy dataset

Category Data Science


It depends on how you choose the subset of the dataset. One required assumption about data for any Machine Learning model to generalize well on unseen data is that the data must come from the same statistical distribution.

So, if you select the subset at (uniformly) random, it's quite sure that your model will continue to perform well. Otherwise, I think it's better to fine-tune the big model on the subset for better performance.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.