Regularization hyperparam tuning during training

Question

Regularization hyperparam tuning during training

Oren Matar

2022年2月18日 21:08

I have an idea for a regularization-hyperparam selection method, which I haven't encountered before and can't find on Google, but I'm sure someone has already tried it and I'm wondering what are the best practices.

The most common method for hyperparam selection is to select different hyperparams (e.g some value for L2 regularization), train NNs with them, and test the NNs on some validation set - and select the best one. My idea is to train a single NN and test the NN on a validation set between epochs, and then auto-adjust the regularization hypeparam between epochs - if we see that the accuracy on the validation set is decreasing between epochs, then we should increase the value of the L1/L2/dropout. Naturally, this can be more efficient than training multiple NNs.

It's still a basic idea, and I'm sure it can be developed further. Is there research and best practices in this field?

Topic hyperparameter-tuning overfitting regularization neural-network

Category Data Science

hH1sG0n3 · Accepted Answer · 2020年9月15日 08:44

As mentioned above you can try a couple of things, depending on which framework you are using and how you want to go with hyperparameter optimisation.

sklearn (with Keras wrappers):

GridSearchCV: slow but sure to find optimal hyperparams
RandomizedSearchCV: faster and almost as good as GridSearch

Keras (with Keras Tuner):

BayesianOptimization: tuning with Gaussian process
Hyperband: Variation of HyperBand algorithm (Li, Lisha, and Kevin Jamieson. Journal of Machine Learning Research 18 (2018): 1-52.)

Pytorch (with RayTune):

Population Based Training: trains a group of models (or agents) in parallel (https://deepmind.com/blog/population-based-training-neural-networks)
HyperBand: as above
ASHA:Compared to the original version of HyperBand, this implementation provides better parallelism and avoids straggler issues during elimination
Median Stopping Rule: implements the simple strategy of stopping a trial if its performance falls below the median of other trials at similar points in time.
FIFOScheduler: Simple scheduler that just runs trials in submission order

Regularization hyperparam tuning during training

sklearn (with Keras wrappers):

Keras (with Keras Tuner):

Pytorch (with RayTune):

About