batch-normalization

To freeze or not, batch normalisation in ResNet when transfer learning

amateurjustin

2022年5月26日 20:06

I'm using a ResNet50 model pretrained on ImageNet, to do transfer learning, fitting an image classification task. The easy way of doing this is simply freezing the conv layers (or really all layers except the final fully connected layer), however I came across a paper where the authors mention that batch normalisation layers should be fine tuned when fitting the new model: Few layers such as Batch Normalization (BN) layers shouldn’t be froze because, the mean and variance of the …

Topic: transfer-learning cnn batch-normalization

Category: Data Science

Batch normalization

vegiv

2022年5月19日 15:26

Part 1 Im going through this article and wanted to try and calculate a forward and backward pass with batch normalization. When doing the steps after the first layer I get a batch norm output that are equal for all features. Here is the code (I have on purpose done it in very small steps): w = np.array([[0.3, 0.4],[0.5,0.1],[0.2,0.3]]) X = np.array([[0.7,0.1],[0.3,0.8],[0.4,0.6]]) def mu(x,axis=0): return np.mean(x,axis=axis) def sigma(z, mu): Ai = np.sum(z,axis=0) return np.sqrt((1/len(Ai)) * (Ai-mu)**2) def Ai(z): return np.sum(z,axis=0) …

Topic: batch-normalization machine-learning

Category: Data Science

Should batch normalization make my eval inference so dependent on the batch size?

worduser

2022年4月28日 19:26

I am using pytorch, and the relevant piece of code is below, from my .forward call: class ModelDense(nn.Module): def __init__(self, raw_features, n, features): super(ModelDense, self).__init__() self.linear_pre = nn.Linear(raw_features, features) self.batchnorm_pre = nn.BatchNorm1d(features) self.tower = ResTowerDense(n, features) self.value_linear1 = nn.Linear(features, features) self.value_batchnorm = nn.BatchNorm1d(features) self.value_linear2 = nn.Linear(features, 1) def forward(self, x, mask0, mask1): y = self.tower(self.batchnorm_pre(self.linear_pre(x))) v = torch.sigmoid(self.value_linear2(self.value_batchnorm(F.relu(self.value_linear1(y))))) Here 'self.tower' is a tower of residual blocks. The output in question is 'v', which is just a sigmoid activation. After training …

Topic: pytorch batch-normalization

Category: Data Science

Why doesn't batch normalization 'zero out' a batch of size one?

worduser

2022年4月12日 19:41

I'm using Tensorflow. Consider the example below: >>> x <tf.Tensor: shape=(1,), dtype=float32, numpy=array([-0.22630838], dtype=float32)> >>> tf.keras.layers.BatchNormalization()(x) <tf.Tensor: shape=(1,), dtype=float32, numpy=array([-0.22619529], dtype=float32)> There doesn't seem to be any change at all, besides maybe some perturbation due to epsilon. Shouldn't a normalized sample of size one just be the zero tensor? I figured maybe there was some problem with the fact that the batch size = 1 (variance is zero in this case, so how do you make the variance =1) But …

Topic: batch-normalization keras tensorflow

Category: Data Science

The added layer must be an instance of class Layer. Found: <keras.layers.normalization.BatchNormalization object at 0x0000024B0C16B780>

Alvi Rahman

2022年3月13日 21:00

I don't know why BatchNormalization is giving the following error. Googled out but couldn't find any relevant answers My code appears to be this:

Topic: batch-normalization keras tensorflow python

Category: Data Science

Compute gradients in parallel

Alex Marshall

2022年3月7日 13:04

Here is part of my code: class SimpleNet(nn.Module): def __init__(self): super().__init__() self.linear1 = nn.Linear(2, 1, bias=False) self.linear2 = nn.Linear(1, 2, bias=False) def forward(self, x): z = self.linear1(x) y_pred = self.linear2(z) return y_pred, z model = SimpleNet().cuda() for epoch in range(1): model.train() for i, dt in enumerate(data.trn_dl): optimizer.zero_grad() output = model(dt[0]) loss2 = 0 for j in range(0,len(output[0])): l1 = torch.autograd.grad(output[0][j][0], output[1], create_graph=True)[0][j] l2 = torch.autograd.grad(output[0][j][1], output[1], create_graph=True)[0][j] loss2 = loss2 + abs(torch.sqrt(l1**2+l2**2)-1) loss1 = F.mse_loss(output[0], dt[1]) loss = loss1+loss2 loss.backward() …

Topic: batch-normalization loss-function gradient-descent

Category: Data Science

Normalization in production

Nick

2022年2月21日 21:05

I am currently writing a machine learning pipeline for my time series application. At the end of each month, I get the data gathered, normalize it ([0, 1]), retrain the ML model with the new observation only and predict future values. Question Should I be reading the entire dataset each time I get a new Observation, normalize the entire dataset, create the ML model, then predict? How I got stuck: Let's say I have 1 feature and at t-1 all …

Topic: batch-normalization normalization python machine-learning

Category: Data Science

Equations in "Batch normalization: theory and how to use it with Tensorflow"

Triceratops

2021年12月21日 08:40

I read the article Batch normalization: theory and how to use it with Tensorflow by Federico Peccia. The batch normalized activation is $$ \bar x_i = \frac{x_i - \mu_B}{\sqrt{\sigma_B^2 + \epsilon}} $$ where $\mu_B = \frac{1}{m} \sum_{i=1}^m x_i$ is the batch mean and $\sigma_B^2 = \frac{1}{m} \sum_{i=1}^m (x_i - \mu_B)^2$ is the batch variance. The scaled and shifted activation is $y_i = \gamma \bar x_i + \beta$ where $\gamma$ and $\beta$ are parameters that the neural network learns. After these …

Topic: mathematics batch-normalization

Category: Data Science

Batch normalization backpropagation doubts

Chappie733

2021年10月10日 13:22

I have recently studied the batch normalization layer and its backpropagation process, using as my main sources the original paper and this website showing part of the derivation process, but there is a step in the part that isn't covered that I don't really understand, namely, using the notation of the website, this is when computing: $$ \frac{\partial \widehat{x}_i}{\partial x_i} = \frac{\partial}{\partial x_i} \frac{x_i - \mu}{\sqrt{\sigma^2+\epsilon}} = \frac{1}{\sqrt{\sigma^2+\epsilon}} $$ Applying the quotient rule I expected the following (since $\mu$ and …

Topic: derivation batch-normalization backpropagation

Category: Data Science

Poor CNN performance after implementing BatchNormalization

Jack

2021年10月2日 02:03

I am training a CNN to classify malware images from a dataset named Malimg. Before implementing the BatchNormalization layer, I was getting an accuracy of 95.57% (see below for the graph of loss/accuracy and validation loss/accuracy): Epoch 1/10 6537/6537 [==============================] - 53s 8ms/step - loss: 1.7711 - accuracy: 0.4605 - val_loss: 1.0062 - val_accuracy: 0.6510 Epoch 2/10 6537/6537 [==============================] - 52s 8ms/step - loss: 0.8739 - accuracy: 0.7150 - val_loss: 0.4965 - val_accuracy: 0.8426 Epoch 3/10 6537/6537 [==============================] - 52s …

Topic: cnn batch-normalization neural-network

Category: Data Science

Explanation of Karpathy tweet about common mistakes. #5: "you didn't use bias=False for your Linear/Conv2d layer when using BatchNorm"

KDecker

2021年9月30日 17:07

I recently found this twitter thread from Andrej Karpathy. In it he states a few common mistakes during the development of a neural network. you didn't try to overfit a single batch first. you forgot to toggle train/eval mode for the net. you forgot to .zero_grad() (inpytorch) before .backward(). you passed softmaxed outputs to a loss that expects raw logits. you didn't use bias=False for your Linear/Conv2d layer when using BatchNorm, or conversely forget to include it for the output …

Topic: bias batch-normalization neural-network

Category: Data Science

Batch normalization for image CNN - Why not use the mean of the entire batch?

mon

2021年9月16日 06:16

Question For CNN to recognize images, why not use the entire batch data, instead of per feature, to calculate the mean in the Batch Normalization? When each feature is independent, need to use per feature. However the features (pixels) of images having RGB channels with 8 bit color for CNN are related. If there are 256 pixels in R channel in an image, 255 for pixel i and 255 for pixel j are both white meaning the same intensity(?) in …

Topic: batch-normalization

Category: Data Science

Sequential batch processing vs parallel batch processing?

Adex

2021年9月13日 12:02

In deep learning based model training, in general batch of inputs are passed. For example for training a deep learning model with [512] dimensional input feature vector, say for batch size= 4, we mainly pass [4,512] dimenional input. I am curious what are the logical significance of passing the same input after flattening the input across the batch and channel dimenions [2048]. Logically the locality structure will be destroyed but will it significanlty speed up my implementation? And can it …

Topic: mini-batch-gradient-descent mathematics cnn batch-normalization deep-learning

Category: Data Science

Batch normalization for multiple datasets?

Manveru

2021年7月30日 07:43

I am working on a task of generating synthetic data to help the training of my model. This means that the training is performed on synthetic + real data, and tested on real data. I was told that batch normalization layers might be trying to find weights that are good for all while training, which is a problem since the distribution of my synthetic data is not exactly equal to the distribution of the real data. So, the idea would …

Topic: pytorch data-augmentation batch-normalization deep-learning dataset

Category: Data Science

What is num_groups in GroupNorm and how to choose it

CasellaJr

2021年7月9日 06:54

I found that batch_norm can cause problems with small batch sizes and that GroupNorm is a good alternative. Now, GroupNorm requires two parameters, the num_group and the num_channels. How can I choose a good value for num_group? On what depends it? And with groupnorm, is good a big batch_size or a small batch_size?

Topic: vgg16 pytorch batch-normalization gpu normalization

Category: Data Science

Does Batch Normalization make sense for a ReLU activation function?

bnorm

2021年7月5日 22:54

Batch Normalization is described in this paper as a normalization of the input to an activation function with scale and shift variables $\gamma$ and $\beta$. This paper mainly describes using the sigmoid activation function, which makes sense. However, it seems to me that feeding an input from the normalized distribution produced by the batch normalization into a ReLU activation function of $max(0,x)$ is risky if $\beta$ does not learn to shift most of the inputs past 0 such that the …

Topic: batch-normalization deep-learning neural-network machine-learning

Category: Data Science

How batch normalization layer resolve the vanishing gradient problem?

user3668129

2021年6月4日 22:13

According to this article: https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484 The vanishing gradient problem occurs when using the sigmoid activation function because sigmoid maps large input space into small space, so the gradient of big values will be close to zero. The article suggests using batch normalization layer. I can't understand how it can works? When using normalization, big values still get big values in another scope (instead of [-inf, inf] they will get [0..1] or [-1..1]) , so in the same cases the values …

Topic: gradient activation-function batch-normalization backpropagation deep-learning

Category: Data Science

Using batchnorm and dropout simultaneously?

AlexM

2021年6月3日 08:52

I am a bit confused about the relation between terms "Dropout" and "BatchNorm". As I understand, Dropout is regularization technique, which is using only during training. BatchNorm is technique, which is using for accelerating training speed, improving accuracy and e.t.c. But I also saw some conflicting opinions about question: is BatchNorm regularization technique? So, can somebody,please, answer some questions: Is BatchNorm regularization technique? Why? Should we use BatchNorm only during training process? Why? Can we use Dropout and BatchNorm simultaneously? …

Topic: batch-normalization dropout neural-network machine-learning

Category: Data Science

Can Batch Normalization replace tanh in RNN?

mon

2021年4月5日 01:35

Question Can Batch Normalization (BN) be inserted in RNN after $x_t@W_{xh}$, and after $h_{t-1}@W_{hh}$ to remove $f=tanh$ and bias $b_h$? If possible, will this eliminate both exploding and vanishing gradient problems? I believe the effect of tanh to adjust the values from [-inf, +inf] into (-1, 1) can be replaced with the standardization in BN and it makes the bias unnecessary at $x_t@W_{xh}$ and $h_{t-1}@W_{hh}$. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift The auto differentiation of …

Topic: batch-normalization rnn

Category: Data Science

Why does batchnorm1d in Pytorch compute 0 with the following example (2 lines of code)?

Eldacar Hyarmendacil

2021年3月25日 11:02

Here is the code import torch import torch.nn as nn x = torch.Tensor([[1, 2, 3], [1, 2, 3]]) print(x) batchnorm = nn.BatchNorm1d(3, eps=0, momentum=0) print(batchnorm(x)) Here is what is printed tensor([[1., 2., 3.], [1., 2., 3.]]) tensor([[0., 0., 0.], [0., 0., 0.]], grad_fn=<NativeBatchNormBackward>) What I am expecting is the following: Using hand calculation, let $x = (1,2,3)$, then $E(x) = (1+2+3)/3 = 2$ and $Var(x) = (1^2 + 2^2 + 3^2) /3 - (2)^2 = 0.9999...$, so that the final …

Topic: pytorch batch-normalization

Category: Data Science

About