Using batchnorm and dropout simultaneously?

I am a bit confused about the relation between terms Dropout and BatchNorm. As I understand,

  1. Dropout is regularization technique, which is using only during training.

  2. BatchNorm is technique, which is using for accelerating training speed, improving accuracy and e.t.c. But I also saw some conflicting opinions about question: is BatchNorm regularization technique?

So, can somebody,please, answer some questions:

  1. Is BatchNorm regularization technique? Why?

  2. Should we use BatchNorm only during training process? Why?

  3. Can we use Dropout and BatchNorm simultaneously? If we can, in what order?

Topic batch-normalization dropout neural-network machine-learning

Category Data Science


  1. Batch normalization can be interpreted as an implicit regularization technique because it can be decomposed into a population normalization term and a gamma decay term, being the latter a form of regularization. This was described in the article Towards Understanding Regularization in Batch Normalization, which was presented at the ICLR'19 conference.

  2. Batch normalization happens at training time. However, at inference time we still apply the normalization, but with the mean and variance statistics learned during training, not with the current batch. The Wikipedia page for Batch normalization gives a nice description of this.

  3. It is posible to use both dropout and batch normalization in the same network, with no specific ordering. However, in some setups, the performance of their combination is worse than applying them separately. This was studied in the article Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift, presented at the CVPR'19 conference:

    [...] Dropout shifts the variance of a specific neural unit when we transfer the state of that network from training to test. However, BN maintains its statistical variance, which is accumulated from the entire learning procedure, in the test phase. The inconsistency of variances in Dropout and BN (we name this scheme “variance shift”) causes the unstable numerical behavior in inference that leads to erroneous predictions finally. [...]

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.