Convolutional neural network overfitting. Dropout not helping

I am playing a little with convnets. Specifically, I am using the kaggle cats-vs-dogs dataset which consists on 25000 images labeled as either cat or dog (12500 each).

I've managed to achieve around 85% classification accuracy on my test set, however I set a goal of achieving 90% accuracy.

My main problem is overfitting. Somehow it always ends up happening (normally after epoch 8-10). The architecture of my network is loosely inspired by VGG-16, more specifically my images are resized to $128x128x3$, and then I run:

Convolution 1 128x128x32 (kernel size is 3, strides is 1)
Convolution 2 128x128x32 (kernel size is 3, strides is 1)
Max pool    1 64x64x32   (kernel size is 2, strides is 2)
Convolution 3 64x64x64   (kernel size is 3, strides is 1)
Convolution 4 64x64x64   (kernel size is 3, strides is 1)
Max pool    2 32x32x64   (kernel size is 2, strides is 2)
Convolution 5 16x16x128  (kernel size is 3, strides is 1)
Convolution 6 16x16x128  (kernel size is 3, strides is 1)
Max pool    3 8x8x128    (kernel size is 2, strides is 2)
Convolution 7 8x8x256    (kernel size is 3, strides is 1)
Max pool    4 4x4x256    (kernel size is 2, strides is 2)
Convolution 8 4x4x512    (kernel size is 3, strides is 1)
Fully connected layer 1024 (dropout 0.5)
Fully connected layer 1024 (dropout 0.5)

All the layers except the last one have relus as activation functions.

Note that I have tried different combinations of convolutions (I started with simpler convolutions).

Also, I have augmented the dataset by mirroring the images, so that in total I have 50000 images.

Also, I am normalizing the images using min max normalization, where X is the image

$X = X - 0 / 255 - 0$

The code is written in tensorflow and the batch sizes are 128.

The mini-batches of training data end up overfitting and having an accuracy of 100% while the validation data seems to stop learning at around 84-85%.

I have also tried to increase/decrease the dropout rate.

The optimizer being used is AdamOptimizer with a learning rate of 0.0001

At the moment I have been playing with this problem for the last 3 weeks and 85% seems to have set a barrier in front of me.

For the record, I know I could use transfer learning to achieve much higher results, but I am interesting on building this network as a self-learning experience.

Update:

I am running the SAME network with a different batch size, in this case I am using a much smaller batch size (16 instead of 128) so far I am achieving 87.5% accuracy (instead of 85%). That said, the network ends up overfitting anyway. Still I do not understand how a dropout of 50% of the units is not helping... obviously I am doing something wrong here. Any ideas?

Update 2:

Seems like the problem had to do with the batch size, as with a smaller size (16 instead of 128) I am achieving now 92.8% accuracy on my test set, with the smaller batch size the network still overfits (the mini batches end up with an accuracy of 100%) however, the loss (error) keeps decreasing and it is in general more stable. The cons are a MUCH slower running time, but it is totally worth the wait.

Topic convolutional-neural-network dropout image-recognition deep-learning neural-network

Category Data Science


Ok, so after a lot of experimentation I have managed to get some results/insights.

In the first place, everything being equal, smaller batches in the training set help a lot in order to increase the general performance of the network, as a negative side, the training process is muuuuuch slower.

Second point, data is important, nothing new here but as I learned while fighting this problem, more data always seems to help a bit.

Third point, dropout is useful in large networks with lots of data and lots of iterations, in my network I applied dropout on the final fully connected layers only, convolution layers did not get dropout applied.

Fourth point (and this is something I am learning over and over): neural networds take A LOT to train, even on good GPUs (I trained this network on floydhub, which uses quite expensive NVIDIA cards), so PATIENCE is key.

Final conclusion: Batch sizes are more important that one might think, apparently it is easier to hit a local minimum when batches are larger.

The code I wrote is available as a python notebook, I think it is decently documented.


I had this problem too. After dinking with it for hours, by chance I decided to shuffle the data before feeding it into the system and voila, it started working. It took me a bit to figure out that it was the shuffling that did the trick! Hope this saves somebody from frustration!


One thing that hasn't been mentioned yet and that you can consider for the future: you can still increase your dropout at the fully connected layers.

I read a paper once that used 90% dropout rate. Although it had many many nodes (2048 if i recall correctly), I have tried this myself on layers with fewer nodes and it was very helpful in some cases.

I just looked up which paper it was. I can't recall which paper I just remembered but I found these that also had some success with 90% dropout rates.

Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 1725-1732).

Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems (pp. 568-576).

Varol, G., Laptev, I., & Schmid, C. (2017). Long-term temporal convolutions for action recognition. IEEE transactions on pattern analysis and machine intelligence.


There are several possible solutions for your Problem.

  1. Use Dropout in the earlier layers (convolutional layers) too.

  2. Your network seems somehow quite big for such an "easy" task; try to reduce it. The big architectures are also trained on much bigger datasets.

If you want to keep your "big" architecture try:

  1. Image augmentation in order to virtually increase your training data

  2. Try adversarial training. It sometimes helps.


I suggest you analyze the learning plots of your validation accuracy as Neil Slater suggested. Then, if the validation accuracy drops try to reduce the size of your network (seems too deep), add dropout to the CONV layers and BatchNormalization after each layer. It can help get rid of overfitting and increase the test accuracy.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.