Batch normalization for image CNN - Why not use the mean of the entire batch?

Question

For CNN to recognize images, why not use the entire batch data, instead of per feature, to calculate the mean in the Batch Normalization?

When each feature is independent, need to use per feature. However the features (pixels) of images having RGB channels with 8 bit color for CNN are related. If there are 256 pixels in R channel in an image, 255 for pixel i and 255 for pixel j are both white meaning the same intensity(?) in R color.

Then why not use the mean of the entire data in a batch? If the pixel channel i happens to have the values between (0, 127) and channel j has (128, 255), the meaning that (0, 127) is within [0, 255] and the relational meaning between i and j, which is, pixel i intensity is lower than that of j) gets lost.

Topic batch-normalization

Category Data Science


The mentioned case is actually correct if you apply BN for an input layer.

However, the point is that BN is used mostly after convolutional or fully-connected layers (before an activation layer). Therefore, BN will be calculated not for pixel values but for the outputs of conv or F-C layers, which are non-limited in terms of range (a value could be anything from -inf to +inf). ‍♂️

And moreover, outputs of conv/F-C layer will have completely different dimensions (since you applied different kernels), and therefore, can't be actually treated as "somehow related" to each other.

Edited: added fully-connected layers

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.