Analysis of prediction shift problem in gradient boosting
I was going through the Catboost paper section 4.1 where they talk about the 'Analysis of prediction shift' using an example consisting of 2 features which are bernoulli random variables. I am unable to wrap my head around the experimental setup. Since there are only 2 indicator features, so we can have only 4 data points, everything else will be duplication. They mention that for train data points the output of the first estimator of the boosting model is biased, but for test data points it isn't. Are they selecting 3 points for training and 4th one for testing? Could someone please explain the setup to me? I would be really grateful to you. Thank you!
Topic gradient-boosting-decision-trees catboost gbm
Category Data Science