Input standartization for Deep Learning - Proper Scaling
Typically the input to neural network (NN) is transformed to have zero mean and 1 std.
I wonder why std scale should be 1? What about other scales? 10? 100? Doesn't it make sense to provide NN with input of wider range so that NN can separate different clusters easier and deal with loss function for each cluster in more simple and robust way? Did someone here tried different scales and can share his experience?
If answer depends on the activation function - in my case I use Relu.
Thanks a lot!
Topic feature-scaling deep-learning
Category Data Science