Assuming fairly reasonable data normalization, the expectation of the weights should be zero or close to it. It might be reasonable, then, to set all of the initial weights to zero because a positive initial weight will have further to go if it should actually be a negative weight and visa versa. This, however, does not work. If all of the weights are the same, they will all have the same error and the model will not learn anything - there is no source of asymmetry between the neurons.
What we could do, instead, is to keep the weights very close to zero but make them different by initializing them to small, non-zero numbers. This is what is suggested in the tutorial that you linked. It has the same advantage of all-zero initialization in that it is close to the 'best guess' expectation value but the symmetry has also been broken enough for the algorithm to work.
This approach has additional problems. It is not necessarily true that smaller numbers will work better, especially if the neural network is deep. The gradients calculated in backpropagation are proportional to the weights; very small weights lead to very small gradients and can lead to the network taking much, much longer to train or never completing.
Another potential issue is that the distribution of the outputs of each neuron, when using random initialization values, has a variance that gets larger with more inputs. A common additional step is to normalize the neuron's output variance to 1 by dividing its weights by $sqrt(d)$ where $d$ is the number of inputs to the neuron. The resulting weights are normally distributed between $\left[\frac{-1}{\sqrt{d}}, \frac{1}{\sqrt{d}}\right]$