Should output data scaling correspond to the activation function's output?
I am building an LSTM with keras
which have an activation
parameter in the layer. I have read that scaling on the output data should match the activation function's output values.
Ex: tanh
activation outputs values between -1 and 1, therefore the output training (and testing) data should be scaled to values between -1 and 1. So if the activation function is asigmoid
the output data should be scaled to values between 0 and 1.
Does this hold for all activation functions? If I use ReLu
as activation in my layers what should the output data be rescaled to?
Topic lstm activation-function normalization feature-scaling deep-learning
Category Data Science