Example for Boosting

Can someone exactly tell me how does boosting as implemented by LightGBM or XGBoost work in real case scenerio. Like I know it splits tree leaf wise instead of level wise, which will contribute to global average not just the loss of branch which will help it learn lower error rate faster than level wise tree.

But I cannot understand completely until I see some real example, I have tried to look at so many articles and videos but everywhere it's theoretical. If someone can share some smal working example or any article that would be really helpful.

Thank you so much.

Topic gradient-boosting-decision-trees lightgbm boosting xgboost machine-learning

Category Data Science


Decision trees algorithm (whether Bagging like Random Forest or Boosting like LightGBM) before Light GBM follows level wise tree growth. In level wise tree growth when you find the best node to split you increase the level of tree. Sometimes the second split done may not lead to optimal results and unneccesary growth of tree to maintain symmetrical trees. Please refer to image below : enter image description here

In LightGBM, the leaf-wise tree growth finds the leaves which will reduce the loss the maximum, and split only that leaf and not bother with the rest of the leaves in the same level. This results in an asymmetrical tree where subsequent splitting can very well happen only on one side of the tree.

Leaf-wise tree growth strategy tend to achieve lower loss as compared to the level-wise growth strategy, but it also tends to overfit, especially small datasets. So small datasets, the level-wise growth acts like a regularization to restrict the complexity of the tree, where as the leaf-wise growth tends to be greedy. enter image description here

Source : https://deep-and-shallow.com/2020/02/21/the-gradient-boosters-iii-lightgbm/


Please refer to this link in order to understand boosting with a worked out example.

https://xgboost.readthedocs.io/en/latest/tutorials/model.html


I think what you actually ask is "how does boosting work". LightGBM or XGBoost are implementations of boosting algorithms.

I like the article by Bühlmann and Hothorn. They provide a very good overview of boosting options.

P. Bühlmann, T. Hothorn (2007), "Boosting Algorithms: Regularization, Prediction and Model Fitting", Statistical Science 22(4), p. 477-505.

In essence, you try to repeatetly "explain" the residual error when you do boosting. When you do so, you come closer to a solution little by little, even in case each single attempt to explain the residual is "weak".

You may also have the benefit, that by repeatetly trying to explain the residual, boosting may discover interesting aspects in the data (you kind of try to learn slow). This works well with tree-based models and with some random element ("stochastic" gradient boosting), e.g. by leaving out some explanatory variables or parts of the rows in the train set.

Here you also have kind of a similarity to random forests, where "bagging" and randomization is key. In tree-based boosting, "shallow" decision trees are grown (usually with "only" four to eight splits or so) to repeatetly explain the residual and to achieve a good fit to some data.

I implemented "linear" boosting from the paper linked above (from Section 3.3 p. 483, Bühlman/Horthon) in R, you can find the code here.

The main steps are (repeatedly):

  • Get the residual from the current estimator
  • Fit the residual and predict
  • Update the loss

You may try to replace the linear model in the "fit" part in the code below by a shallow tree and see what happens.


# Boosting (p. 483, Sec. 3.3, L2-Boosting)

for (n in seq(1:maxiter)){
  
  # I) Get residual
  if(n==1){f=f0} # initialize in first step  
  u=y-f          # get residual from estimation 
  
  # II) Fit residual / predict, I use ridge (alpha=0)
  reg = glmnet(x,u,alpha=0, lambda=2)       
  g = predict(reg, newx=x, type="link") 
  
  # III) Update f
  f=f+nu*g
  # Print feedback
  cat(paste0("Step: ", n," SSR: ", sum(u^2), "\n" ))
  
  # Save SSR/iter to list
  ssrlist[[n]] <- sum(u^2)
  bstep[[n]] <- n
  # Early stopping rule
  if(sum(u^2)<es){break}
}

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.