Number and size of dense layers in a CNN

Most networks I've seen have one or two dense layers before the final softmax layer.

  • Is there any principled way of choosing the number and size of the dense layers?
  • Are two dense layers more representative than one, for the same number of parameters?
  • Should dropout be applied before each dense layer, or just once?

Topic convolutional-neural-network

Category Data Science


1 layer gives linear approximation (can use if need linear regression)

2 & more layers provide non-linearity... e.g. 2 provide in results speed & acceleration


First of all:

There is no way to determine a good network topology just from the number of inputs and outputs. It depends critically on the number of training examples and the complexity of the classification you are trying to learn.[1]

and Yoshua Bengio has proposed a very simple rule:

Just keep adding layers until the test error does not improve anymore.[2]

Moreover:

The earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset.[3]

For example, in a method for learning feature detectors:

first layer learns edge detectors and subsequent layers learn more complex features, and higher level layers encode more abstract features. [4]

So, using two dense layers is more advised than one layer.

Finally:

The original paper on Dropout provides a number of useful heuristics to consider when using dropout in practice. One of them is: Use dropout on incoming (visible) as well as hidden units. Application of dropout at each layer of the network has shown good results. [5]

in CNN, usually, a Dropout layer is applied after each pooling layer, and also after your Dense layer. A good tutorial is here [6]

References:

[1] https://www.cs.cmu.edu/Groups/AI/util/html/faqs/ai/neural/faq.html

[2] Bengio, Yoshua. "Practical recommendations for gradient-based training of deep architectures." Neural networks: Tricks of the trade. Springer Berlin Heidelberg, 2012. 437-478.

[3] http://cs231n.github.io/transfer-learning/

[4] http://learning.eng.cam.ac.uk/pub/Public/Turner/Teaching/ml-lecture-3-slides.pdf

[5] https://machinelearningmastery.com/dropout-regularization-deep-learning-models-keras/

[6] https://cambridgespark.com/content/tutorials/convolutional-neural-networks-with-keras/index.html

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.