How to improve the learning rate of an MLP for regression when tanh is used with the Adam solver as an activation function?

I'm trying to use an MLP to approximate a smooth function f : R^3 - R, that takes a point in space as an argument, and returns a scalar value.

The MLP architecture has a 3-dimensional (for 3 point coordinates) input layer, N hidden layers and a single linear scalar output layer, since the output should be the function value:

   x  x      x
x  x  x      x
x  x  x  ... x  x
x  x  x      x
   x  x      x

I'm using the tanh activation function because I want the model (MLP) to be continuously differential.

I'm playing with different hidden layer architectures, using the Adam solver, and I get this behavior for the MSE loss

The maximal validation error that I get with this mean MSE loss is 99.982942% - is this generally considered accurate for regression?

For the network with hidden layers (16, 16, 16, 16, 16), the error stagnates but then drops and oscillates. I suspect the oscillation is due to the diminishing gradients when using the tanh activation function for learning, is this true?

How to set/improve the learning rate, are there techniques that prevent oscillations when the solver (Adam, SGD, ...) approaches the optimum?

Topic mlp learning-rate regression

Category Data Science


If your goal is to set the optimum value of hyperparameters (weather it be learning rate, no of layers, activation function etc.) you should look into Keras Tuner.

The reason as to why the learning curve is oscillating is not clear to me without seeing your code/data. But if you want to set an optimum value of learning rate then look into Kera Tuner provided you ARE using keras + tensorflow.

If you are using PyTorch as the frame work then you could use any of the following HP tuning methods/packages:

1.)Ray Tune

2.) Optuna

3.) Auto Pytorch

4.) BoTorch

There are many more packages. You could do a google search.

Cheers!

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.