What are desirable properties of layers in Deep Learning?
I have been thinking about the following the problem: Given some task we assume there is a magical function that perfectly solve this task. For example, if we want to distinguish cats and dogs, then we can train neural network that hopefully converges over time to a function similar to our magical function.
The problem is now: How can we help/encourage our network to converge to a good/better function? In theory a single layer + a non-linearity can be enough, there are definitely infinite good solutions to our problem, however most likely our single layer will never converge to a meaningful solution.
For sure though, there are a wide variety of different tools one can deploy. For example, changing our overall architecture - however I am less interested in that at the moment. I am more interested in the actual functions and properties they have. I only know of the following, desirable properties functions/layers might have. Surely there are more, what would be others to look into?
- Equivariant to a certain transformation
- Sparse weights
- Regularizes to bound weights, clipping
- Lipschitz continuous
Topic objective-function convergence deep-learning
Category Data Science