Lightgbm vs xgboost vs catboost

I've seen that in Kaggle competitions people are using lightgbms where they used to use xgboost. My question is: when would you rather use xgboost instead of lightgbm? What about catboost?

Topic catboost lightgbm xgboost kaggle machine-learning

Category Data Science


On Kaggle, LightGBM is indeed the "meta" base learner of almost all of the competitions that have structured datasets right now. This is mostly because of LightGBM's implementation; it doesn't do exact searches for optimal splits like XGBoost does in it's default setting (XGBoost now has this functionality as well but it's still not as fast as LightGBM) but rather through histogram approximations. The result is a slight decrease in predictive performance for a much larger speed increase in training. This means more opportunity for feature engineering/experimentation/model tuning (all of which are key to winning Kaggle competitions) which inevitably yields larger increases in predictive performance (despite using histogram approximations).

CatBoost is not used as much because on average, it it found to be much slower than LightGBM. That being said, CatBoost is different in its implementation of gradient boosting which at times can give slightly more accurate predictions, in particular if you have large amounts of categorical features. I have never used CatBoost and so I encourage you to read that paper. Regardless, because rapid experimentation is vital in Kaggle competitions, LightGBM tends to be the go to algorithm when first creating strong base learners.

In general, it is important to note that a large amount of approaches I've seen involve combining all three boosting algorithms in a model stack (i.e. ensembling). LightGBM, CatBoost, and XGBoost might be thrown together as three base learners and then combined via. a GLM or neural network. This is done to really squeeze out decimal places on the leaderboard and so I doubt there is any theoretical (or practical) justification for it besides competitions.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.