Interpretation of learning curve - neural network

When I run my three different neural networks I obtain the following learning curves using MSE.

I believe that my model base is okay and is not overfitting or underfitting. Furthermore, I believe that my model small is underfitting due to high training error and high validation error. However, I'm not sure about model big. Taking the square root of the MSE the RMSE of both the train set and validation set in model big is lower than model base. Yet on the picture and from what I have learned in class it is still underfitting?

Is this correct? I just do not understand how the model performs well but it does not learn looking at the picture.

Thank you in advance.

Topic neural-network machine-learning

Category Data Science


In principle, I would agree that model small and model big are underfitting. It would be helpful if you provide more information regarding the data (number of samples, range of target variables, number of predictors, number of target variables, distribution of target variables, and distribution of predictors) to fully analyze the plots you provided

It's not easy to compare the plots you showed:

  1. Firstly, the range of the $y$ is not the same in all three plots. In the first plot the $y$ range goes up to 4.000.000 and in the other two, it's between 125.000 and 2.500, which makes it hard to judge where exactly model base converges.

  2. Secondly, you only provide one run. It may be that the run was particularly (un)lucky. The analysis would be more robust if he had e.g., three runs plotted with mean and std.dev. for each model.

  3. Lastly, the first model is trained for $20$ epochs while the other two are trained for $15$.

Also, what is the difference between loss and mean_squared_error? They seem identical.


Based on the learning curves shown, the best model is the base model as both the training error and validation error *decreases as epochs increase.

Now both your small and big models are underfitting. This is apparent from the fact that both the curves are not converging like they do in the base model. I would say there is something wrong about the neural net you have constructed. You are probably missing something vital in your structure (maybe cross validation).


Probably your dataset is not a complex one, since your Model Base converges in the first few epochs and your Model Big (maybe because of the number of parameters it has and the non-linearity it can capture) performs good from the beginning.

Notice that the Model Small starts at a higher loss than Model Base and even after 15 epochs does not reach the same loss as Base Model at epoch 5. I think you should check the architecture and initializations of Model Small.

Depending on the magnitude of your target variable, the gap between training and validation loss can be explained.

Try using cross validation with test data to make sure that your model works well. If not, try gathering more data. If it is not possible, try using data augmentation.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.