Why is each successive tree in GBM fit on the negative gradient of the loss function?

Question

Why is each successive tree in GBM fit on the negative gradient of the loss function?

GeorgeOfTheRF

2022年5月27日 16:04

Page 359 of Elements Of Statistical Learning 2nd edition says the below.

Can someone explain the intuition simplify it in layman terms?

Questions

What is the reason/intuition math behind fitting each successive tree in GBM on the negative gradient of the loss function?
Is it done to make GBM more generalization on unseen test dataset? If so how does fitting on negative gradient achieve this generalization on test data?

Topic loss-function gbm gradient-descent optimization machine-learning

Category Data Science

Brian Spiering · Accepted Answer · 2022年4月18日 18:37

The negative gradient is used because a loss function is minimized. Moving in a "downward" is reducing the expected error.

The text is trying to explain the logic of boosting. Instead of fitting the model on all the training data at one time, train the model only a subsample of the training data at a time. Then, the model can learn from its errors on subsequent blocks of training data.

Why is each successive tree in GBM fit on the negative gradient of the loss function?

About