Should output data scaling correspond to the activation function's output?

Question

Should output data scaling correspond to the activation function's output?

user134132523

2022年2月13日 02:06

I am building an LSTM with keras which have an activation parameter in the layer. I have read that scaling on the output data should match the activation function's output values.

Ex: tanh activation outputs values between -1 and 1, therefore the output training (and testing) data should be scaled to values between -1 and 1. So if the activation function is asigmoid the output data should be scaled to values between 0 and 1.

Does this hold for all activation functions? If I use ReLu as activation in my layers what should the output data be rescaled to?

Topic lstm activation-function normalization feature-scaling deep-learning

Category Data Science

user1825567 · Accepted Answer · 2020年4月27日 15:01

What you read hold true for the neurons of the output layers and not for the hidden layers!

Hence, its true that if you are using tanh in output layers then you need the data labels to be within [-1, 1] where as between [0, 1] for sigmoid.

As for your concern with relu, use it on output layers if you know that the range of the labels of the data is positive only. If you are using relu in hidden layers then the scaling doesn't depend on relu but rather on the type of activation function used in output layers.

Should output data scaling correspond to the activation function's output?

About