Overfitting? Is it ok, if I've met my desired threshold?
I've trained a lightgbm classification model, selected features, and tuned the hyperparameters all to obtain a model that appears to work well.
When I've come to evaluate it on an out of bag selection of data, it appears to be slightly overfit to the training data.
CV mean F1 score. = .80
OOB F1 score = .77
For me this appears to be an acceptable tolerance. For my chosen requirements an out of bag score of .77 is perfectly acceptable.
How do we tell if this score is overfit? Is it? Or is overfit just a concept left for us to determine our self?
Is there a method to calculate an acceptable tolerance between a cross validated and out of bag evaluation metric - or is it all empirical?
Topic f1score overfitting machine-learning
Category Data Science