Is there a relationship between learning rate and training set size?

Question

Is there a relationship between learning rate and training set size?

jakes

2022年3月1日 03:04

I have a large dataset to use for training a Neural Network model. However, I don't have enough resources to do a proper hyperparameters tuning on the whole dataset. Therefore, my idea is to tune the learning rate on the subset of data (let's say 10%), which won't obviously give as good estimate as the whole dataset would give, but since it's already a significant amount of data I would expect it might give the estimate which is sufficient enough.

However, I wonder if there is some relationship between learning rate and the training size, other than the fact that the estimate would be noisier when subsampling. And by the relationship I mean some rule of thumb, for example, increase your optimal LR, when you're increasing the training set size. I don't see any (apart from the relationship between LR and batch size, but it's actually other topic), but would love to make sure that I'm not missing anything.

Topic learning-rate sampling deep-learning neural-network

Category Data Science

Peter · Accepted Answer · 2021年5月16日 11:38

I wonder why would you tune the learning rate (too much) at all? On many platforms you can include something like a learning rate scheduler. It will decrease the learing rate when no more learning progress is made. So you "tune" the learning rate "on the fly" while training.

With Keras this could look like:

early_stopping = EarlyStopping(monitor='val_loss', patience=50, mode='auto', restore_best_weights=True)
reduce_on_plateau = ReduceLROnPlateau(monitor="val_loss", factor=0.9, patience=20, cooldown=5, verbose=0)
checkpoint = ModelCheckpoint("xy.hdf5", monitor='val_loss', verbose=0, save_best_only=True, save_weights_only=True, mode='min')
...
model.fit(...,callbacks=[early_stopping, reduce_on_plateau, checkpoint]

The callbacks above introduce "early stopping" (to stop if no more progress is made), "reduce on plateau" (decrease learning rate when required), and "checkpoint" (save model results on each epoch).

Start with some reasonable standard value for the learning rate and decrease if necessary.

Is there a relationship between learning rate and training set size?

About