learning-rate

How can i deal with this overfitting?

Lei

2022年5月19日 12:10

I trained my data over 40 epochs but got finally this shape. How can I deal with this problem? Please as I used 30.000 for training and 5000 for testing and lr_schedule = keras.optimizers.schedules.ExponentialDecay( initial_learning_rate=4e-4, decay_steps=50000, decay_rate=0.5) should I increase the number of data in testing or make changes in the model? EDIT After I add regularization I got this shape and the loss started from a number greater than before in the previous shape, does that normal? Is this …

Topic: learning-rate overfitting deep-learning

Category: Data Science

Learning rate terminology, what is 'reducing' a learning rate?

Ian Newson

2022年4月5日 19:34

I'm investigating a loss plateau and various techniques for overcoming it, which led me to this page and statement: Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This callback monitors a quantity and if no improvement is seen for a 'patience' number of epochs, the learning rate is reduced. https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ReduceLROnPlateau I'm confused by this terminology. If my learning rate is 0.001, am I reducing the learning rate when it moves to 0.01, …

Topic: learning-rate tensorflow machine-learning

Category: Data Science

GAN optimizer settings in Keras

alift

2022年4月1日 14:03

I am working on a Generative Adversarial Network, implementing in Keras. I have my generator model, G, and discriminator D, both are being created by two functions, and then the GAN model is created using these two models, like this light sample of the code: gopt=Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08) dopt=Adam(lr=0.00005, beta_1=0.9, beta_2=0.999, epsilon=1e-08) opt_gan = Adam(lr=0.00006, beta_1=0.9, beta_2=0.999, epsilon=1e-08) G= gmodel(......) G.compile(loss=...., optimizer=gopt) D=dmodel(..) D.trainable = False GAN=ganmodel(generator_model=G,discriminator_model=D,...) GAN_model.compile(loss=["mae", "binary_crossentropy"], loss_weights=[0.5, 0.5], optimizer=opt_gan) D.trainable = True D.compile(loss='binary_crossentropy', optimizer=dopt) now my …

Topic: learning-rate gan keras optimization machine-learning

Category: Data Science

How to tune learning rate with HParams Dashboard on Tensorflow?

Arwen

2022年3月8日 16:07

In Tensorflow documentation, it is shown how to tune several hyperparameters but not the learning rate.I have searched how to tune learning rate using HParams dashboard but could not find much. The only example is another question on github but it does not work.Can you please give me some suggestions on this?Should I use a callback function?Or provide different learning rates in hp_optimizer as in the question in github? Or something else? Parts of my code is below: HP_NUM_UNITS = …

Topic: hyperparameter-tuning learning-rate tensorflow deep-learning

Category: Data Science

Is there a relationship between learning rate and training set size?

jakes

2022年3月1日 03:04

I have a large dataset to use for training a Neural Network model. However, I don't have enough resources to do a proper hyperparameters tuning on the whole dataset. Therefore, my idea is to tune the learning rate on the subset of data (let's say 10%), which won't obviously give as good estimate as the whole dataset would give, but since it's already a significant amount of data I would expect it might give the estimate which is sufficient enough. …

Topic: learning-rate sampling deep-learning neural-network

Category: Data Science

Does `ReduceLROnPlateau()` has a way to know metric of previous epoch ..when training had to be restarted at say epoch 10 using epoch 9 h5 model?

Akash2022

2022年2月28日 17:09

I use shared GPU cluster for my NN training. There is a cap of 8 hours for training run. After that I have to restart it using model output of epoch it stopped at..I am using 'Keras.ReduceLROnPlateau()' for changing learning rate. Question is whether ReduceLROnPlateau() has a way to know metric of previous epoch at which training stopped or does the patience restarts again when I restart training? Is there a way to make patience not reset for each restart …

Topic: learning-rate keras neural-network machine-learning

Category: Data Science

how can i set learning rate for big data?

sam

2022年2月27日 16:39

will it need more epochs for training or it is not a necessary and what is the learning rate I should set for this data with optimizer adam?

Topic: epochs learning-rate training keras

Category: Data Science

Is learning_rate linear with the time to converge using AdamOpt?

Adar Cohen

2022年1月10日 12:22

Say that both learning rates 1e-3,1e-4 leading to the same solution (not too high or too small). In terms of convergence by the amount of epochs, does optim.Adam(model.parameters(), lr=1e-3) compare to optim.Adam(model.parameters(), lr=1e-4) will take 10 time more epoch? So if an optimizer with lr=1e-3 reached the solution at epoch 130, theoretically, an optimizer with lr=1e-4 will get there at epoch 1300? I think that my statement is true in a vanilla SGD, but in Adam's opt there's both momentum …

Topic: sgd learning-rate convergence deep-learning optimization

Category: Data Science

Learning rate Scheduler

user

2021年12月28日 05:02

A very important aspect in deep learning is the learning rate. Can someone tell me, how to initialize the lr and how to choose the decaying rate. I'm sure there are valuable pointers that some experienced people in the community can share with others. I've noticed that many choose to do a custom scheduler rather than use available ones. Can someone tell me why and what influences the change in the lr? And when to describe a lr as being …

Topic: learning-rate deep-learning machine-learning

Category: Data Science

How to improve the learning rate of an MLP for regression when tanh is used with the Adam solver as an activation function?

tmaric

2021年12月21日 11:58

I'm trying to use an MLP to approximate a smooth function f : R^3 -> R, that takes a point in space as an argument, and returns a scalar value. The MLP architecture has a 3-dimensional (for 3 point coordinates) input layer, N hidden layers and a single linear scalar output layer, since the output should be the function value: x x x x x x x x x x ... x x x x x x x x x …

Topic: mlp learning-rate regression

Category: Data Science

Variable batch size for inputs of different length

Christian Adam

2021年12月16日 12:49

We're fine-tuning a GPT-2 model (using the Adam optimizer) to some posts from a social network. The length of these posts varies quite dramatically, so while some are only two tokens long, others can span hundreds of tokens. We've defined a cutoff at 256, but creating batches randomly and then padding is quite costly in terms of training time. We are now sorting the posts by length and then sampling randomly in consecutive blocks of n posts, where n is …

Topic: transformer learning-rate language-model nlp machine-learning

Category: Data Science

What ML model to train on when using an adaptive learning rate - the most recent or the one with the least validation loss?

user128161

2021年11月25日 08:48

I am currently implementing an adaptive learning rate for a neural network, meaning the learning rate gets reduced (e.g., halves) every time the validation error plateaus for 3 epochs (exemplary, could also be another n epochs). Let's have a look at the following epoch and validation loss progress: epoch 0, val loss 0.3 epoch 1, val loss 0.29 epoch 2, val loss 0.28 epoch 3, val loss 0.27 epoch 4, val loss 0.26 epoch 5, val loss 0.265 epoch 6, …

Topic: learning-rate deep-learning machine-learning

Category: Data Science

Tuning Batch size and Learning rate in neural net

Suvra Dutta

2021年11月19日 16:15

The following MCQ question is provided in "Exam Readiness: AWS Certified Machine Learning - Specialty" document. The correct answer has been marked in the document but I am not able to understand why this option is correct. Question: "A data scientist is working on optimizing a model during the training process by varying multiple parameters. The data scientist observes that, during multiple runs with identical parameters, the loss function converges to different, yet stable, values. What should the data scientist …

Topic: hyperparameter-tuning mini-batch-gradient-descent learning-rate deep-learning

Category: Data Science

Decreasing Learning Rate doesn't improve the results

wuiwuiwui

2021年11月5日 11:06

In theory and what people are doing (e.g. Paper) decreasing the learning rate should help the optimizer to go "deeper into the valley" and thus decrease the loss and increase the metric. Thus, my plan was to train a neural network with a learning rate of 1 until the loss and my metric stay approx. the same for some epochs, then with 0.1, then 0.01 and so on. However, what I'm observing is, that the loss of the model stagnates …

Topic: loss learning-rate deep-learning machine-learning

Category: Data Science

Tune learning rate while tuning other HP

M. Garrigues

2021年10月7日 14:56

When doing hyperparameters optimisation, like a Random Search, should you add a search space for the learning rate ? My intuition is that some HP might work better with a certain LR, and be sub-optimal with a lower LR. But if I add LR to the search space, I fear that the random search will only favour high LR trials, as they will reach lower loss for the same limited number of max epochs. What would be the right way …

Topic: hyperparameter-tuning learning-rate machine-learning

Category: Data Science

Loss & accuracy curves from learning rate range test interpretation

nico_so

2021年9月20日 13:28

I am working on a project doing experiments with the Learning Rate Range Test (See "A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay" ; "No More Pesky Learning Rate Guessing Games"; and "Cyclical learning rates for training deep neural networks" by L. Smith, for references) I am not doing exactly the same as in the papers. What my implementations does is varying the learning rate linearly from an initial learning …

Topic: hyperparameter-tuning learning-rate cnn deep-learning

Category: Data Science

pytorch lightning produces no checkpoint when learning rate fine tuning ison

Gergely Mathe

2021年8月31日 13:46

My problem is concerning with using the automatic learning rate finder of pytorch lightning. In case I use this feature there isn't any checkpoint output produced at any time during the training of the model. I define the a trainer which I later use to tune the learning rate first and fitt the model later on as the next pseudo like code snippet shows: checkpoint = pl.callbacks.ModelCheckpoint(monitor="val_loss",save_last=True,period=1) trainer = pl.Trainer( auto_lr_find=True, max_steps=config["steps"], gpus=config["gpus"], precision=config["precision"], accumulate_grad_batches=config["accumulate_grad_batches"], checkpoint_callback=checkpoint, logger=logger, accelerator='ddp', plugins=[DDPPlugin(find_unused_parameters=True)], ) …

Topic: finetuning learning-rate pytorch machine-learning-model training

Category: Data Science

What is going on with this kind of validation loss graph?

JRowan

2021年3月11日 12:14

I am using stock prices and a whole bunch of indicators values to try to get a tensorflow model to predict to buy,sell, or hold. I think im going about this right but when i train the model, first i set a learning rate scheduler to increase the learning rate until the model converges and i use the training rate from the graph where the train loss and val loss first make their steeppest slope down for the next training …

Topic: learning-rate convergence tensorflow loss-function

Category: Data Science

Learning rate of 0 still changes weights in Keras

Evator

2021年2月18日 02:49

I just trained a model (SGD) with keras and was wondering why the change of accuracy and loss from epoch to epoch doesn't really decrease that much when I lower the learning rate. So I tested what happens when I set the learning rate to 0 and to my surprise, accuracy and loss still changed from epoch to epoch and I can't find an explanation for that. Does anyone know why this could be happening?

Topic: sgd learning-rate keras

Category: Data Science

Why a sign of gradient (plus or minus) is not enough for finding a steepest ascend?

Kenenbek Arzymatov

2021年1月17日 07:40

Consider a simple 1-D function $y = x^2$ to find a maximum with the gradient ascent method. If we start in point 3 on x-axis: $$ \frac{\partial f}{\partial x} \biggr\rvert_{x=3} = 2x \biggr\rvert_{x=3} = 6 $$ This means that a direction in which we should move is a $6$. Gradient ascent gives rule to update: x = old_x + learning_rate * gradient What I can't understand why we need to multiply a learing_rate with gradient. Why we can't just use …

Topic: gradient learning-rate gradient-descent

Category: Data Science

About