GBM: small change in the trainset causes radical change in predictions
I have build a model using transactions data trying to predict the value of future transactions. The main algorithm is Gradient Boosting Machine. The overall accuracy on the testset is fine and there is no sign of overfitting. However, a small change in the training set creates radical change in the model, and in the predictions. But even when the testset change a little the overall accuracy is stable.
The time period is from 2005 to today and when a single day is added to the dataset predictions change drastically (e.g. +/- 10%). If multiple training are perform on the same training set, the predictions are the same.
I have test Light GBM(2.1.0) and XGBoost(0.60) with Python 3.6 on Windows 10. A seed is set and I train the model on CPUs. I have tried to increase the number of iterations to a high number and adding a specific seed to the bagging parameters.
This blogpost discuss brefly that issu without giving any solutions.
Category Data Science