Why are mini-batches degrading my conv net MNIST classifier?

I have made a convolutional neural network from scratch in python to classify the MNIST handwritten digits (centralized). It is composed of a single convolutional network with 8 3x3 kernels, a 2x2 maxpool layer and a 10 node dense layer with softmax as the activation function. I am using cross entropy loss and SGD.

When I train the network on the whole training set for a single epoch with a batch size of 1, I get 95% accuracy. However, when I try with a larger batch size (16, 32, 128), the learning becomes very noisy and the end accuracy is anywhere between 47%-86%. Why is it that my network performs so much worse and noisier on mini-batches?

Topic mini-batch-gradient-descent convolutional-neural-network gradient-descent neural-network

Category Data Science


Your model is very small for a convnet. 1 conv layer, 1 maxpool and 1 fc is very shallow. Try adding more layers and batchnorm2d after each conv layer followed by relu. No pool layers.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.