Equations in "Batch normalization: theory and how to use it with Tensorflow"

Question

Equations in "Batch normalization: theory and how to use it with Tensorflow"

Triceratops

2021年12月21日 08:40

I read the article Batch normalization: theory and how to use it with Tensorflow by Federico Peccia.

The batch normalized activation is $$ \bar x_i = \frac{x_i - \mu_B}{\sqrt{\sigma_B^2 + \epsilon}} $$ where $\mu_B = \frac{1}{m} \sum_{i=1}^m x_i$ is the batch mean and $\sigma_B^2 = \frac{1}{m} \sum_{i=1}^m (x_i - \mu_B)^2$ is the batch variance. The scaled and shifted activation is $y_i = \gamma \bar x_i + \beta$ where $\gamma$ and $\beta$ are parameters that the neural network learns.

After these definition the following set of equations appears:

I think there are mistakes in it.

Suppose there are $j$ batches, each of size $m$. I think the first equation (inference mean) is average of means so it should be $$ E_x = \frac{1}{j} \sum_{i=1}^{j} \mu_B^{(i)} $$ because this is the mean of $j$ mini-batches.

Similarly in the second equation. What are the correct formulas?

In the third equation, shouldn't it be $$ y = \gamma \bar x + \beta = \gamma \frac{x - E_x}{\sqrt{\mathrm{Var}_x + \epsilon}} + \beta = \frac{\gamma}{\sqrt{\mathrm{Var}_x + \epsilon}} x + \beta - \frac{\gamma E_x}{\sqrt{\mathrm{Var}_x + \epsilon}} ? $$

Wikipedia (Batch notmalization) confirms my debugging.

Topic mathematics batch-normalization

Category Data Science

Valeria Mordoh · Accepted Answer · 2021年12月21日 08:40

According to the paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" you are correct. m is the batch size number and not the number of total mini-batches. So the division is by j. In addition, when the variables are placed in the third formula, the equation you formulate is indeed obtained.

Equations in "Batch normalization: theory and how to use it with Tensorflow"

About