GBM: small change in the trainset causes radical change in predictions

Question

GBM: small change in the trainset causes radical change in predictions

Charles_de_Montigny

2022年3月10日 15:04

I have build a model using transactions data trying to predict the value of future transactions. The main algorithm is Gradient Boosting Machine. The overall accuracy on the testset is fine and there is no sign of overfitting. However, a small change in the training set creates radical change in the model, and in the predictions. But even when the testset change a little the overall accuracy is stable.

The time period is from 2005 to today and when a single day is added to the dataset predictions change drastically (e.g. +/- 10%). If multiple training are perform on the same training set, the predictions are the same.

I have test Light GBM(2.1.0) and XGBoost(0.60) with Python 3.6 on Windows 10. A seed is set and I train the model on CPUs. I have tried to increase the number of iterations to a high number and adding a specific seed to the bagging parameters.

This blogpost discuss brefly that issu without giving any solutions.

Topic xgboost gbm python

Category Data Science

vico · Accepted Answer · 2019年5月28日 08:12

A good way to avoid this problem is to add noise to your training dataset this will made your model more robust and less versatile.

there is a different way of adding noise, often a gaussian noise is added it depends to the kind of data you have.

GBM: small change in the trainset causes radical change in predictions

About