Should output data scaling correspond to the activation function's output?
I am building an LSTM with keras which have an activation parameter in the layer. I have read that scaling on the output data should match the activation function's output values.
Ex: tanh activation outputs values between -1 and 1, therefore the output training (and testing) data should be scaled to values between -1 and 1. So if the activation function is asigmoid the output data should be scaled to values between 0 and 1.
Does this hold for all activation functions? If I use ReLu as activation in my layers what should the output data be rescaled to?
Topic lstm activation-function normalization feature-scaling deep-learning
Category Data Science