What ML model to train on when using an adaptive learning rate - the most recent or the one with the least validation loss?
I am currently implementing an adaptive learning rate for a neural network, meaning the learning rate gets reduced (e.g., halves) every time the validation error plateaus for 3 epochs (exemplary, could also be another n epochs).
Let's have a look at the following epoch and validation loss progress:
epoch 0, val loss 0.3
epoch 1, val loss 0.29
epoch 2, val loss 0.28
epoch 3, val loss 0.27
epoch 4, val loss 0.26
epoch 5, val loss 0.265
epoch 6, val loss 0.264
epoch 7, val loss 0.263
In this case, the last 3 epochs have not yielded a new learning rate minimum and, therefore, epoch 8 should train with half the learning rate.
My question now is: should epoch 8 learn upon the model from epoch 7 (the most recent), or should it learn upon the model from epoch 4 (with the least validation loss)?
Most implementations on GitHub I have seen continue with epoch 8 based on the model from epoch 7. However, I was wondering whether it made more sense to train based on the model from epoch 4, as this was the best model so far and is not overfit / was not trained in a bad direction?
Are there any books / papers / blog posts on this?
Thanks in advance!
Topic learning-rate deep-learning machine-learning
Category Data Science