Should batch normalization make my eval inference so dependent on the batch size?

Question

Should batch normalization make my eval inference so dependent on the batch size?

worduser

2022年4月28日 19:26

I am using pytorch, and the relevant piece of code is below, from my .forward call:

class ModelDense(nn.Module):
def __init__(self, raw_features, n, features):
  super(ModelDense, self).__init__()
  self.linear_pre = nn.Linear(raw_features, features)
  self.batchnorm_pre = nn.BatchNorm1d(features)
  self.tower = ResTowerDense(n, features)
  self.value_linear1 = nn.Linear(features, features)
  self.value_batchnorm = nn.BatchNorm1d(features)
  self.value_linear2 = nn.Linear(features, 1)

def forward(self, x, mask0, mask1):

  y = self.tower(self.batchnorm_pre(self.linear_pre(x)))
  v = torch.sigmoid(self.value_linear2(self.value_batchnorm(F.relu(self.value_linear1(y)))))

Here 'self.tower' is a tower of residual blocks. The output in question is 'v', which is just a sigmoid activation.

After training multiple networks (same topology besides width and depth of the tower and training hyperparameters), I tested the output by running one input at a time. I made sure to run model.eval() first.

The batch norm layers would throw an error if I tried to input with batch_size == 1, so as a cheat I simply copied my input along dim=0 so it was batch_size == 2.

The problem is that each model will only return one value (depending on the model), no matter what my input was. If I input more than one row, then I get varied and seemingly working value outputs.

I understand how the batch normalization layer works, and with batch_size == 1 then my final batch norm layer, self.value_batchnorm will always output a zero tensor. This zero tensor is then fed into a final linear layer and then sigmoid layer. It makes perfect sense why this only gives one output.

But still this seems like I might be using the layer itself wrong, like missing some specific eval/train setting? Is it simply the case that in order to get a valid inference from a batchnorm-utilizing model, I must include a sample in, well, a batch?

Topic pytorch batch-normalization

Category Data Science

Should batch normalization make my eval inference so dependent on the batch size?

About