NGBoost and overfit - which model is used?

While training an NGBoost model I got:

[iter 0] loss=-2.2911 val_loss=-2.3309 scale=2.0000 norm=1.0976

[iter 100] loss=-3.3288 val_loss=-2.8532 scale=2.0000 norm=0.7841

[iter 200] loss=-4.0889 val_loss=-1.5779 scale=2.0000 norm=0.7544

[iter 300] loss=-4.8400 val_loss=8.8107 scale=2.0000 norm=0.6710

[iter 400] loss=-5.4463 val_loss=51.7171 scale=2.0000 norm=0.5999

It looks like overfit occurred between iterations 100 and 200. Is the best (val_loss wise) model saved, or did I get the last one reported (with a massive overfit, -5.4463 in train loss vs 51.7171 in validation loss)?

If I really do get the overfitted model, how can I introduce early stopping (or model saving) based on the validation score?

Topic early-stopping ngboost overfitting

Category Data Science


No, I'm afraid you won't get the best model if you don't ask for it specifically. But don't despair - just set the fit parameter of early_stopping_rounds to a number - and it will stop after this number of rounds in which the validation loss got worse.

The number of rounds that would work best for you should be set experimentally (i.e., just fiddle around fit till you get best results, but don't forget hyperparameter overfit...)

PS. The remarks in the code claim that this number of rounds is consecutive, but looking a the code it does not look like the count is reset each time validation loss improves (the counter val_loss_list is not reset).

You could change this manually be editing ngboost.py.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.