Why does my CNN program improve neither the training accuracy nor the validation accuracy despite the error function drastically decreasing?

I have written a Python code to model a convolutional neural network (pastebin link) from the most basic Python libraries (numpy and math in addition to sklearn and pandas being used only for reading data). I will summarize the code's structure:

  1. Goal: To read MNIST dataset (total of 1797 8x8 grayscale images) and predict what number is written.

  2. Neural Network type: Basic convolutional; first layer will be three 3x3 filters with stride 1 (no padding), second layer will be three 3x3x3 filters with stride 1 (no padding), third layer will be a dense layer of 10 neurons, fourth layer will be another dense layer of 10 neurons.

  3. The desired output of the neuron should be of the form [0,0,0,1,0,0,0,0,0,0] if the MNIST data-point's label reads 4, etc.

  4. Training set = first 700 elements

  5. Validation set size = next 1097 elements

  6. First an array ('inf') of size 700 is created, each element pertains to one training data-point.

  7. In every such element, three sub-elements exist-first sub-element pertains to just the 8x8=64 numbers of the given data-point, second sub-element pertains to 3x6x6 numbers obtained from first convolutional layer, third sub-element pertains to 3x4x4 numbers obtained from second convolutional layer. Thus, all in all, this array 'inf' of 700 elements stores the information generated from the dataset and the first two layers of the CNN.

  8. Another array 'ninf' of 700 elements is created to similarly store all the information generated by the next two (dense) layers of the CNN.

  9. Two arrays storing weight vectors of the CNN are created-one ('cw') for storing the convolutional layers' weights (that is, the filter's weights along with the biases) and another ('nw') for storing the dense layers' weights.

  10. The CNN function is described, with two convolutional layers, which generates the information from the dataset and stores it into array inf, followed by the ANN function, which two dense layers which takes information from inf and generates each layer's output, storing it in array ninf.

  11. 1000 epochs are chosen.

  12. The least square mean error function is chosen.

  13. Activation function of $0.5 + 0.5\times\tanh(\mathbf{\cdot}) $ is chosen.

  14. All of the layers of the CNN are trained (including the filters and their biases) according to the backpropagation algorithm.

  15. Training accuracy and validation accuracy are thereafter evaluated.

What I have observed after initially following the above steps is that the error function reduced from around 6300 (maximum being 7000 as each data-point could contribute at most 10 to the error function, 6300 being an obvious result of randomness in choosing what to output) to less than 1293.

However, the training accuracy (which is the number of correct predictions from the CNN after taking an argmax of the output) as well as the validation accuracy were both initially around 10% and even after the 1000 epochs they remain at 10%. None of these accuracy measures improved significantly despite the error function making such drastic reductions.

I suspected initially that this could have been because of the choice of the error function. However, upon replacing the least square error function with the softmax function it still failed to produce any significant improvements in the training and validation accuracies.

Note that, in the pastebin link above, I have used weights that I obtained after running for 1000 epochs as the initial values for weight arrays 'cw' and 'nw'.

Topic mnist cnn accuracy

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.