Loss & accuracy curves from learning rate range test interpretation

I am working on a project doing experiments with the Learning Rate Range Test (See A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay ; No More Pesky Learning Rate Guessing Games; and Cyclical learning rates for training deep neural networks by L. Smith, for references)

I am not doing exactly the same as in the papers. What my implementations does is varying the learning rate linearly from an initial learning rate to a final one (where lr_init lr_end) and record loss accuracy for every batch of images, and take the moving average of those.

I do 5 runs with different random seeds initializing the network parameters, and average them.

The result with a very simple CNN, on CIFAR10, for 6 epochs.

I can observe huge gaps between epochs, but that does not resemble what most loss curves usually look when I am reading the literature. This is a problem to me, since the following step was taking the numerical derivative of the blue curve, to find the range of greatest loss decrease, which will clearly not be too informative with those jumps.

I think this might be caused for either: a) Moving average not taking long enough period b) Batch-level loss/accuracy makes it extremely sensitive

I wanted to know if someone could come up with an explanation of this situation, and how might I solve it.

Thanks a lot in advance

Topic hyperparameter-tuning learning-rate cnn deep-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.