Studying and choosing between different neural network structures

I would like to develop a model that uses convolutional neural networks for image classification. From the many different network structures described in papers and articles online, I would like to choose, as a starting point, the one that better suits my problem and dataset.

I know that there is no certain answer and the best structure is highly dependent on each problem, but I imagine that there is some method behind building such a network beyond pure chance and testing. What properties and hyperparameters should I pay attention to when reading papers and comparing structures? In order to acquire this intuition about different models, is it better to read more literature or focus on experimenting with different models?

Though I have special interest in convolutional neural networks, this question also applies to studying the architecture of neural networks in general.

Topic convolutional-neural-network image-classification self-study neural-network

Category Data Science


First of all, get familiar with the standard benchmarking dataset, Imagenet. One of the reasons why it is a famous dataset is because, if a technique shows effectiveness on Imagenet, there is a very high probability that the same technique will be useful for any other dataset.

After that, start reading papers or try to find some good articles/blogs/tutorials about these papers on the internet. Try to answer this question - What novel contributions did this paper make? Start all the way from LeNet to the latest state-of-the-art models. If you want to be extremely good at understanding these papers, implement them from scratch. It will take a long time. But you will get the best intuition about CNN architectures. What all have people tried to explore till now. What has worked for them, what is used the most, what is not used anymore. For example - talking about the ResNet paper, one of the main contributions of the paper was introducing Skip Connections. And today's latest CNNs like efficientnet also use these Skip Connections. So you got one intuition that you must give a try at Skip Connections. On the other hand, in the VGG paper, they did not use the global average pooling layer. Which makes it one of the heaviest models. This means not using the global average pooling layer will increase model parameters by a lot. And lastly, the efficientnet paper will give you a very good idea about designing and scaling CNN architectures. You will gain a lot of insights and methods like this by reading these papers. This knowledge will be very useful when you will make CNNs from scratch.

Now, once you gain this theoretical knowledge, let's talk about applying it to a new dataset. For this, we need a trial and error method. But your theoretical intuition will give you a great boost. Also, I use a technique to implement this trial and error method -

  1. I create a small subset of the dataset if the dataset is very big.
  2. I create a very good validation set.
  3. As this new subset of the dataset is small, I run many experiments on this small dataset. I start by implementing very simple models and go up to implementing complex techniques till I get the satisfying performance on my validation set.
  4. Once I get a good architecture (one that has satisfying accuracy), I try to train it on the whole dataset. And, I also try to scale it (efficientnet paper).

Lastly, I would like to mention that it is not only the model architecture that will make a difference always. There are also techniques like student-teacher learning and meta pseudo labels (novel training methodologies) that boost the performance significantly.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.