How can i deal with this overfitting?

I trained my data over 40 epochs but got finally this shape. How can I deal with this problem? Please as I used 30.000 for training and 5000 for testing and lr_schedule = keras.optimizers.schedules.ExponentialDecay( initial_learning_rate=4e-4, decay_steps=50000, decay_rate=0.5) should I increase the number of data in testing or make changes in the model? EDIT After I add regularization I got this shape and the loss started from a number greater than before in the previous shape, does that normal? Is this …
Category: Data Science

Learning rate terminology, what is 'reducing' a learning rate?

I'm investigating a loss plateau and various techniques for overcoming it, which led me to this page and statement: Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This callback monitors a quantity and if no improvement is seen for a 'patience' number of epochs, the learning rate is reduced. https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ReduceLROnPlateau I'm confused by this terminology. If my learning rate is 0.001, am I reducing the learning rate when it moves to 0.01, …
Category: Data Science

GAN optimizer settings in Keras

I am working on a Generative Adversarial Network, implementing in Keras. I have my generator model, G, and discriminator D, both are being created by two functions, and then the GAN model is created using these two models, like this light sample of the code: gopt=Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08) dopt=Adam(lr=0.00005, beta_1=0.9, beta_2=0.999, epsilon=1e-08) opt_gan = Adam(lr=0.00006, beta_1=0.9, beta_2=0.999, epsilon=1e-08) G= gmodel(......) G.compile(loss=...., optimizer=gopt) D=dmodel(..) D.trainable = False GAN=ganmodel(generator_model=G,discriminator_model=D,...) GAN_model.compile(loss=["mae", "binary_crossentropy"], loss_weights=[0.5, 0.5], optimizer=opt_gan) D.trainable = True D.compile(loss='binary_crossentropy', optimizer=dopt) now my …
Category: Data Science

How to tune learning rate with HParams Dashboard on Tensorflow?

In Tensorflow documentation, it is shown how to tune several hyperparameters but not the learning rate.I have searched how to tune learning rate using HParams dashboard but could not find much. The only example is another question on github but it does not work.Can you please give me some suggestions on this?Should I use a callback function?Or provide different learning rates in hp_optimizer as in the question in github? Or something else? Parts of my code is below: HP_NUM_UNITS = …
Category: Data Science

Is there a relationship between learning rate and training set size?

I have a large dataset to use for training a Neural Network model. However, I don't have enough resources to do a proper hyperparameters tuning on the whole dataset. Therefore, my idea is to tune the learning rate on the subset of data (let's say 10%), which won't obviously give as good estimate as the whole dataset would give, but since it's already a significant amount of data I would expect it might give the estimate which is sufficient enough. …
Category: Data Science

Does `ReduceLROnPlateau()` has a way to know metric of previous epoch ..when training had to be restarted at say epoch 10 using epoch 9 h5 model?

I use shared GPU cluster for my NN training. There is a cap of 8 hours for training run. After that I have to restart it using model output of epoch it stopped at..I am using 'Keras.ReduceLROnPlateau()' for changing learning rate. Question is whether ReduceLROnPlateau() has a way to know metric of previous epoch at which training stopped or does the patience restarts again when I restart training? Is there a way to make patience not reset for each restart …
Category: Data Science

Is learning_rate linear with the time to converge using AdamOpt?

Say that both learning rates 1e-3,1e-4 leading to the same solution (not too high or too small). In terms of convergence by the amount of epochs, does optim.Adam(model.parameters(), lr=1e-3) compare to optim.Adam(model.parameters(), lr=1e-4) will take 10 time more epoch? So if an optimizer with lr=1e-3 reached the solution at epoch 130, theoretically, an optimizer with lr=1e-4 will get there at epoch 1300? I think that my statement is true in a vanilla SGD, but in Adam's opt there's both momentum …
Category: Data Science

Learning rate Scheduler

A very important aspect in deep learning is the learning rate. Can someone tell me, how to initialize the lr and how to choose the decaying rate. I'm sure there are valuable pointers that some experienced people in the community can share with others. I've noticed that many choose to do a custom scheduler rather than use available ones. Can someone tell me why and what influences the change in the lr? And when to describe a lr as being …
Category: Data Science

How to improve the learning rate of an MLP for regression when tanh is used with the Adam solver as an activation function?

I'm trying to use an MLP to approximate a smooth function f : R^3 -> R, that takes a point in space as an argument, and returns a scalar value. The MLP architecture has a 3-dimensional (for 3 point coordinates) input layer, N hidden layers and a single linear scalar output layer, since the output should be the function value: x x x x x x x x x x ... x x x x x x x x x …
Category: Data Science

Variable batch size for inputs of different length

We're fine-tuning a GPT-2 model (using the Adam optimizer) to some posts from a social network. The length of these posts varies quite dramatically, so while some are only two tokens long, others can span hundreds of tokens. We've defined a cutoff at 256, but creating batches randomly and then padding is quite costly in terms of training time. We are now sorting the posts by length and then sampling randomly in consecutive blocks of n posts, where n is …
Category: Data Science

What ML model to train on when using an adaptive learning rate - the most recent or the one with the least validation loss?

I am currently implementing an adaptive learning rate for a neural network, meaning the learning rate gets reduced (e.g., halves) every time the validation error plateaus for 3 epochs (exemplary, could also be another n epochs). Let's have a look at the following epoch and validation loss progress: epoch 0, val loss 0.3 epoch 1, val loss 0.29 epoch 2, val loss 0.28 epoch 3, val loss 0.27 epoch 4, val loss 0.26 epoch 5, val loss 0.265 epoch 6, …
Category: Data Science

Tuning Batch size and Learning rate in neural net

The following MCQ question is provided in "Exam Readiness: AWS Certified Machine Learning - Specialty" document. The correct answer has been marked in the document but I am not able to understand why this option is correct. Question: "A data scientist is working on optimizing a model during the training process by varying multiple parameters. The data scientist observes that, during multiple runs with identical parameters, the loss function converges to different, yet stable, values. What should the data scientist …
Category: Data Science

Decreasing Learning Rate doesn't improve the results

In theory and what people are doing (e.g. Paper) decreasing the learning rate should help the optimizer to go "deeper into the valley" and thus decrease the loss and increase the metric. Thus, my plan was to train a neural network with a learning rate of 1 until the loss and my metric stay approx. the same for some epochs, then with 0.1, then 0.01 and so on. However, what I'm observing is, that the loss of the model stagnates …
Category: Data Science

Tune learning rate while tuning other HP

When doing hyperparameters optimisation, like a Random Search, should you add a search space for the learning rate ? My intuition is that some HP might work better with a certain LR, and be sub-optimal with a lower LR. But if I add LR to the search space, I fear that the random search will only favour high LR trials, as they will reach lower loss for the same limited number of max epochs. What would be the right way …
Category: Data Science

Loss & accuracy curves from learning rate range test interpretation

I am working on a project doing experiments with the Learning Rate Range Test (See "A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay" ; "No More Pesky Learning Rate Guessing Games"; and "Cyclical learning rates for training deep neural networks" by L. Smith, for references) I am not doing exactly the same as in the papers. What my implementations does is varying the learning rate linearly from an initial learning …
Category: Data Science

pytorch lightning produces no checkpoint when learning rate fine tuning ison

My problem is concerning with using the automatic learning rate finder of pytorch lightning. In case I use this feature there isn't any checkpoint output produced at any time during the training of the model. I define the a trainer which I later use to tune the learning rate first and fitt the model later on as the next pseudo like code snippet shows: checkpoint = pl.callbacks.ModelCheckpoint(monitor="val_loss",save_last=True,period=1) trainer = pl.Trainer( auto_lr_find=True, max_steps=config["steps"], gpus=config["gpus"], precision=config["precision"], accumulate_grad_batches=config["accumulate_grad_batches"], checkpoint_callback=checkpoint, logger=logger, accelerator='ddp', plugins=[DDPPlugin(find_unused_parameters=True)], ) …
Category: Data Science

What is going on with this kind of validation loss graph?

I am using stock prices and a whole bunch of indicators values to try to get a tensorflow model to predict to buy,sell, or hold. I think im going about this right but when i train the model, first i set a learning rate scheduler to increase the learning rate until the model converges and i use the training rate from the graph where the train loss and val loss first make their steeppest slope down for the next training …
Category: Data Science

Learning rate of 0 still changes weights in Keras

I just trained a model (SGD) with keras and was wondering why the change of accuracy and loss from epoch to epoch doesn't really decrease that much when I lower the learning rate. So I tested what happens when I set the learning rate to 0 and to my surprise, accuracy and loss still changed from epoch to epoch and I can't find an explanation for that. Does anyone know why this could be happening?
Category: Data Science

Why a sign of gradient (plus or minus) is not enough for finding a steepest ascend?

Consider a simple 1-D function $y = x^2$ to find a maximum with the gradient ascent method. If we start in point 3 on x-axis: $$ \frac{\partial f}{\partial x} \biggr\rvert_{x=3} = 2x \biggr\rvert_{x=3} = 6 $$ This means that a direction in which we should move is a $6$. Gradient ascent gives rule to update: x = old_x + learning_rate * gradient What I can't understand why we need to multiply a learing_rate with gradient. Why we can't just use …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.