Equations in "Batch normalization: theory and how to use it with Tensorflow"
I read the article Batch normalization: theory and how to use it with Tensorflow by Federico Peccia.
The batch normalized activation is $$ \bar x_i = \frac{x_i - \mu_B}{\sqrt{\sigma_B^2 + \epsilon}} $$ where $\mu_B = \frac{1}{m} \sum_{i=1}^m x_i$ is the batch mean and $\sigma_B^2 = \frac{1}{m} \sum_{i=1}^m (x_i - \mu_B)^2$ is the batch variance. The scaled and shifted activation is $y_i = \gamma \bar x_i + \beta$ where $\gamma$ and $\beta$ are parameters that the neural network learns.
After these definition the following set of equations appears:
I think there are mistakes in it.
Suppose there are $j$ batches, each of size $m$. I think the first equation (inference mean) is average of means so it should be $$ E_x = \frac{1}{j} \sum_{i=1}^{j} \mu_B^{(i)} $$ because this is the mean of $j$ mini-batches.
Similarly in the second equation. What are the correct formulas?
In the third equation, shouldn't it be $$ y = \gamma \bar x + \beta = \gamma \frac{x - E_x}{\sqrt{\mathrm{Var}_x + \epsilon}} + \beta = \frac{\gamma}{\sqrt{\mathrm{Var}_x + \epsilon}} x + \beta - \frac{\gamma E_x}{\sqrt{\mathrm{Var}_x + \epsilon}} ? $$
Wikipedia (Batch notmalization) confirms my debugging.
Topic mathematics batch-normalization
Category Data Science