What are some general tips to improve my MNIST classifier?

I have built a CNN from scratch in python using Numpy, to tackle the MNIST hand-written digit recognition problem. It's composed out of a convolutional layer (3 3x3 filters), a maxpooling layer (2x2 pooling) and the 10-label output layer. I'm using softmax for the output activation function and cross-entropy as loss function. I've tried running it with a couple of different hyperparameters and so far the best accuracy i've gotten is 97%, when training on the whole train dataset (60000 images) for a single epoch and using SGD. The accuracy does vary a bit though, usually around 92-95% under these conditions. I've only tried one epoch when using the whole dataset because it already takes maybe 15 min for my algorithm to train using 60000 images (with cpu on my low grade school laptop). The thing is that I don't have an perception of how good/bad this is, i.e. how much time one would expect such an network to take and how accurate it would be. Is this really slow and inaccurate? I'd love some general tips on how I can improve my network, whether it's through some optimization methods or brute force (increased amount of layers/neurons). I've also tried implementing mini-batches but for some reason (maybe faulty implementation?) this only seems to decrease the accuracy.

Topic mnist cnn neural-network classification python

Category Data Science


The fact that you are using accuracy as the metric suggests that your model is not performing well. It is an overly optimistic model.

The reason being that for multiclass classification you should never use accuracy. It will always give overly optimistic results. Go for categorical_accuracy as the metric and sparse_categorical_crossentropy as the loss.

As for increasing the performance of your model:

1.) Increase your number of epochs to atleast 50 or 100

2.) Use a different metric as suggested above

3.) Use adam or a variation of adam as the optimizer

4.) Since you do not mention hyperparameter tuning I am assuming you haven't done any. Try Keras Tuner for hyperparameter tuning.

As far as computational time is considered in CNN, 15 mins is not computationally expensive. CNN even when training on GPU takes a lot of time so 15 mins is nothing new.


You may find this OpenML interesting

In there several benchmarks on different datasets are shown so that you can see how different models score across the same dataset

For your task MNIST the plot shows:

enter image description here

that might give a good idea of how well your model is performing as compared to other implementations/models

Metric button in the upper left corner will allow you to compare in terms of classification metrics and also in terms of execution time

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.