Should batch normalization make my eval inference so dependent on the batch size?
I am using pytorch, and the relevant piece of code is below, from my .forward call:
class ModelDense(nn.Module):
def __init__(self, raw_features, n, features):
super(ModelDense, self).__init__()
self.linear_pre = nn.Linear(raw_features, features)
self.batchnorm_pre = nn.BatchNorm1d(features)
self.tower = ResTowerDense(n, features)
self.value_linear1 = nn.Linear(features, features)
self.value_batchnorm = nn.BatchNorm1d(features)
self.value_linear2 = nn.Linear(features, 1)
def forward(self, x, mask0, mask1):
y = self.tower(self.batchnorm_pre(self.linear_pre(x)))
v = torch.sigmoid(self.value_linear2(self.value_batchnorm(F.relu(self.value_linear1(y)))))
Here 'self.tower' is a tower of residual blocks. The output in question is 'v', which is just a sigmoid activation.
After training multiple networks (same topology besides width and depth of the tower and training hyperparameters), I tested the output by running one input at a time. I made sure to run model.eval() first.
The batch norm layers would throw an error if I tried to input with batch_size == 1, so as a cheat I simply copied my input along dim=0 so it was batch_size == 2.
The problem is that each model will only return one value (depending on the model), no matter what my input was. If I input more than one row, then I get varied and seemingly working value outputs.
I understand how the batch normalization layer works, and with batch_size == 1 then my final batch norm layer, self.value_batchnorm
will always output a zero tensor. This zero tensor is then fed into a final linear layer and then sigmoid layer. It makes perfect sense why this only gives one output.
But still this seems like I might be using the layer itself wrong, like missing some specific eval/train setting? Is it simply the case that in order to get a valid inference from a batchnorm-utilizing model, I must include a sample in, well, a batch?
Topic pytorch batch-normalization
Category Data Science