Theoretical basis for neural network "effort"

Question

Theoretical basis for neural network "effort"

Elliot Way

2021年6月28日 16:36

I might be in danger of having my question closed as not clear what I'm asking for, but here goes.

Suppose we have a simple feedforward network. It has a few layers, each layer has a reasonable number of neurons, nothing complicated. Let's say the output has size $n$, and there is no final activation function on the output.

The network will have an easier time training to produce some outputs relative to others. In particular, outputs close to 0, that is, closer to the origin in the $\mathbb{R}^n$ output space, will be easier. But this is an intuition I have; I'm not sure if this is actually true. (By easier I think what I really mean is in fewer iterations.)

I haven't found a source for this, but there is a lot of advice on the internet to normalize one's data which seems to have a similar motivation.

Is there any theoretical basis for this notion of the effort a network needs to produce an output? Is it meaningful at all to talk about without making assumptions about the function we're trying to learn?

And if this idea is accurate in some way, is the effort independent of direction? Is it like an $n$-variate Gaussian distribution where the density of that point corresponds to the effort the network has to produce that point? Or is the distribution spikier, making it easier to output e.g. $[0,0,0,1]$ relative to $[\frac{1}{2},\frac{1}{2},\frac{1}{2},\frac{1}{2}]$? (Or vice versa?)

Topic theory neural-network machine-learning

Category Data Science

Brian Spiering · Accepted Answer · 2021年6月28日 16:36

You asked several questions, I'll answer the one about which specific distributions are easier to learn. Information Theory would predict that $[0,0,0,1]$ would far easier to learn than $[\frac{1}{2},\frac{1}{2},\frac{1}{2},\frac{1}{2}]$.

$[\frac{1}{2},\frac{1}{2},\frac{1}{2},\frac{1}{2}]$ is a uniform discrete distribution which has maximum entropy. $[0,0,0,1]$ has far less entropy.

Theoretical basis for neural network "effort"

About