Is this a case of overfitting?

I am training a CNN-LSTM-FC network with around 1 Million parameters network for spatial time series prediction but the validation loss does not even come close to the training loss after 1-2 epochs. Is this a clear case of overfitting? Any inputs would be appreciated! I am using keras with tensorflow backend.

Topic keras tensorflow

Category Data Science


To me it looks like a clear case of overfitting and perhaps the main reason is that your model is far too complex for the problem. In order to differentiate beween over and under fitting you can think about learning in the following way.

The data which is given as example (training data) contains the bahavior of the true model with some additional noise. You cannot make a clear difference between them. What you do when you want to learn is actually to try to approximate the true function with a model of any sort.

In this process you come with some learning biases: you assume some possible forms for the true function (each learning method assumes some types of functions which could be approximated) and also assume some type shape of the noise (also encoded in the model structure and it's parameters).

Here you could encounter two situations. If the model is not able to build an approximation by construction (consider tryingto fit a linear model to a sinusoidal shaped true function with noise, for example), then you will not be able to learn too much from training data and also the test data will have a similar error. This is because a less flexible model is very robust (at extrema you can fit a constant model for regression, the constant model will have most of the time similar error on train or test).

If your model is too flexible, than you can approximate your true function but also more than this, aka you incorporate also many errors from the training sample. That means you approximate too wel the training data (low error on training) and you will be most of the time far from test data (errors from test data are different than training data), thus you have high errors there.

Since nobody is able to differentiate the true function from noise, neither a learning model will not be able. Very often the first questions when you start to build models for a problem are related with how complex is my problem and how flexible is my model. Since you are talking about deep nets, having a milion of parameters is a lot (despite the general indecent hype around the topic, where adding milions of additional parameters and burning a lot of money with training is a happy activity). My 2cents: start with simpler models, especially the ones which underfits, which can give you an idea about what kind of battle do you have to win.


I would rather say, that in your training, underfitting occurs, because the train values are much better, than the test values. This can occur, when your memory cells cannot store enough information, so maybe try to increase the units or nodes of your LSTM, so the loss can converge better.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.