How to improve a CNN without changing the architecture?

I'm currently using an autoencoder CNN that's built upon the VGG-16 architecture that was designed by someone else. I want to replicate their results using their dataset first but I'm finding that:

-Validation losses diverge from training losses fairly early on (I get to around 10 epochs and it already looks like it's overfitting) -At its best, the validation losses aren't even close to being as low as training losses -In general, the accuracy is still worse than reported in their paper.

I'm new to machine learning and want to know if there are hyperparameters I should try to change or what I can do to maybe tinker with it without changing its architecture?

Topic finetuning hyperparameter-tuning cnn training neural-network

Category Data Science


Are you in fact using the same architecture as they are? If not that could potentially be the problem.

Otherwise, are you using the same trainings protocol as they, i.e. optimizer, learning rate, learning rate schedule, batch size, preprocessing, weight initialization, number of training epochs? Depending on the size of your model and the amount of training data, 10 epochs might not be enough to judge about your models performance.

Can you link the paper?

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.