learning curves of a classification algorithm

I a trying to understand this learning curve of a classification problem. But I am not sure what to infer. I believe that I have overfitting but I cannot sure.

  1. Very low training loss that’s very slightly increasing upon adding training examples.
  2. Gradually decreasing validation loss (without flattening) upon adding training examples. However, I do not see any gap at the end of the lines something that is usually can be found in an overfitting model

On the other hand, I might have underfitting as:

  1. Learning curve of an underfit model has a low training loss at the beginning which gradually increases upon adding training examples and stay flat, indicating the addition of more training examples can’t improve the model performance on unseen data
  2. Training loss and validation loss are close to each other at the end

However, the train error is not to big something that usually is found on underfitting models

I am confused Can you please provide me with some advice?

Topic bias variance classification machine-learning

Category Data Science


Since you didn't mention this in the description, let me first emphasize that these graphs show the performance for different sizes of the training set, i.e. this is the result of an ablation study. This is important because it means that every point on the X axis represents a different model.

Now, what these two graphs show is clear overfitting on the left part of the graphs: the performance is very high on the training set and very low on the test/validation set. So all the models which are trained with less than around 450-500 instances are overfit. But the two curves converge up to the point where the training and validation performance are equal, so the last model with around 500 instances does not have any sign of overfitting.

Underfitting is less common and less easy to detect from this kind of curve, but it would be characterized by a low performance even on the training set. This does not happen here, so I think this can be ruled out. The fact that performance is the same on the training and test set only shows that there's no overfitting, not that the model is underfit.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.