Why are mini-batches degrading my conv net MNIST classifier?

Question

Why are mini-batches degrading my conv net MNIST classifier?

dontloseyourgoalie

2022年1月28日 13:08

I have made a convolutional neural network from scratch in python to classify the MNIST handwritten digits (centralized). It is composed of a single convolutional network with 8 3x3 kernels, a 2x2 maxpool layer and a 10 node dense layer with softmax as the activation function. I am using cross entropy loss and SGD.

When I train the network on the whole training set for a single epoch with a batch size of 1, I get 95% accuracy. However, when I try with a larger batch size (16, 32, 128), the learning becomes very noisy and the end accuracy is anywhere between 47%-86%. Why is it that my network performs so much worse and noisier on mini-batches?

Topic mini-batch-gradient-descent convolutional-neural-network gradient-descent neural-network

Category Data Science

Alex · Accepted Answer · 2021年2月4日 20:16

1

Alex answered at 2021年2月4日 20:16

Your model is very small for a convnet. 1 conv layer, 1 maxpool and 1 fc is very shallow. Try adding more layers and batchnorm2d after each conv layer followed by relu. No pool layers.

Why are mini-batches degrading my conv net MNIST classifier?

About