What are the reasons for drawing initial neural network weights from the Gaussian distribution?
Are there theoretical or empirical reasons for drawing initial weights of a multilayer perceptron from a Gaussian rather than from, say, a Cauchy distribution?