gbm - Geeks Mental

Why does Light GBM model produce different results while testing?

HEMANTHKUMAR GADI

2022年6月4日 06:08

Using the Light GBM regressor, I have trained my data and, using Grid Search, I got the best parameters, but while testing with the best parameters I am getting different results each time, which means the model produces different results for each test iteration. I ran the lightgbm twice with the same parameters, but got different results in validation. I found the only random seed parameter to be baggingSeed. After fixing baggingSeed, the problem also occurred. Should I fix any …

Topic: lightgbm grid-search training gbm machine-learning

Category: Data Science

Why is each successive tree in GBM fit on the negative gradient of the loss function?

GeorgeOfTheRF

2022年5月27日 16:04

Page 359 of Elements Of Statistical Learning 2nd edition says the below. Can someone explain the intuition & simplify it in layman terms? Questions What is the reason/intuition & math behind fitting each successive tree in GBM on the negative gradient of the loss function? Is it done to make GBM more generalization on unseen test dataset? If so how does fitting on negative gradient achieve this generalization on test data?

Topic: loss-function gbm gradient-descent optimization machine-learning

Category: Data Science

Analysis of prediction shift problem in gradient boosting

Abhijeet Biswas

2022年5月17日 09:01

I was going through the Catboost paper section 4.1 where they talk about the 'Analysis of prediction shift' using an example consisting of 2 features which are bernoulli random variables. I am unable to wrap my head around the experimental setup. Since there are only 2 indicator features, so we can have only 4 data points, everything else will be duplication. They mention that for train data points the output of the first estimator of the boosting model is biased, …

Topic: gradient-boosting-decision-trees catboost gbm

Category: Data Science

GBM: small change in the trainset causes radical change in predictions

Charles_de_Montigny

2022年3月10日 15:04

I have build a model using transactions data trying to predict the value of future transactions. The main algorithm is Gradient Boosting Machine. The overall accuracy on the testset is fine and there is no sign of overfitting. However, a small change in the training set creates radical change in the model, and in the predictions. But even when the testset change a little the overall accuracy is stable. The time period is from 2005 to today and when a …

Topic: xgboost gbm python

Category: Data Science

Is this random forest logical correct and correct implemented with R and gbm?

ScienceLover

2022年3月9日 13:00

For professional reasons I want to learn and understand random forests. I feel unsafe if my understanding is the correct or if I am doing logical errors. I got a data set with 15 million entries and want to make a regression for a numerical target (time). The data structure is: I have 7 categorical variables, 1 date and 4 numerical features. After data preparation I split the data into training and test data set. Than I defined a gradient …

Topic: gbm cross-validation random-forest r

Category: Data Science

aggregation of feature importance

dean

2022年2月18日 12:01

I have more of a conceptual question I was hoping to get some feedback on. I am trying to run a boosted regression ML model to identify a subset of important predictors for some clinical condition. The dataset includes over 100000 rows, and close to 1000 predictors. Now, the etiology of the disease we are trying to predict is largely unknown. Thus, we likely don’t have data on many important predictors for the condition. That is to say, as a …

Topic: feature-importances shap xgboost gbm cross-validation

Category: Data Science

Textbook recommendation on gradient boosting

data_science_learner

2021年11月28日 23:28

I am looking for a machine learning textbook which gives a detailed derivation of gradient boosting with all mathematics behind it. I will be happy for recommendations.

Topic: gbm

Category: Data Science

Xgboost quantile regression via custom objective

chris

2021年9月23日 16:11

I am new to GBM and xgboost, and am currently using xgboost_0.6-2 in R. The modeling runs well with the standard objective function "objective" = "reg:linear" and after reading this NIH paper I wanted to run a quantile regression using a custom objective function, but it iterates exactly 11 times and the metric does not change. I just simply switched out the 'pred' statement following the GitHub xgboost demo, but am afraid it is more complicated than that and I …

Topic: xgboost gbm gradient-descent predictive-modeling machine-learning

Category: Data Science

Loss function in GradientBoostingRegressor

user3902660

2021年9月19日 04:39

Scikit Learn GradientBoostingRegressor: I was looking at the scikit-Learn documentation for GradientBoostingRegressor. Here it says that we can use 'ls' as a loss function which is least squares regression. But I am confused since least squares regression is a method to minimize the SSE loss function. So shouldn't they mention SSE here?

Topic: loss-function terminology gbm scikit-learn machine-learning

Category: Data Science

What is init_score in lightGBM?

WCMC

2021年9月17日 19:45

In the tutorial boosting from existing prediction in lightGBM R, there is a init_score parameter in function setinfo. I am wondering what init_score means? In the help page, it says: init_score: initial score is the base prediction lightgbm will boost from Another thing is what does "boost" mean in lightGBM?

Topic: self-study gbm

Category: Data Science

How fit pairwise ranking models in XGBoost?

tokestermw

2021年7月30日 17:11

As far as I know, to train learning to rank models, you need to have three things in the dataset: label or relevance group or query id feature vector For example, the Microsoft Learning to Rank dataset uses this format (label, group id, and features). 1 qid:10 1:0.031310 2:0.666667 ... 0 qid:10 1:0.078682 2:0.166667 ... I am trying out XGBoost that utilizes GBMs to do pairwise ranking. They have an example for a ranking task that uses the C++ program …

Topic: xgboost ranking gbm search

Category: Data Science

Does XGBoost handle multicollinearity by itself?

ope

2021年6月9日 05:47

I'm currently using XGBoost on a data-set with 21 features (selected from list of some 150 features), then one-hot coded them to obtain ~98 features. A few of these 98 features are somewhat redundant, for example: a variable (feature) $A$ also appears as $\frac{B}{A}$ and $\frac{C}{A}$. My questions are : How (If?) do Boosted Decision Trees handle multicollinearity? How would the existence of multicollinearity affect prediction if it is not handled? From what I understand, the model is learning more …

Topic: xgboost gbm correlation feature-selection

Category: Data Science

Random Forest but keep only leaves with impurities below a threshold

user1776576

2021年5月10日 16:11

Is there an algorithm out there that creates a random forest but then prunes all the leaves that have an impurity measure above a certain threshold that I would determine? In other words, if I set min samples per leaf to be 500 and leaves have to have at least a 90% purity for example, the algorithm would only keep leaves that respect these parameters. My dataset is extremely noisy so most leaves have a gini impurity around 0.5 but …

Topic: lightgbm xgboost gbm random-forest machine-learning

Category: Data Science

LightGBM - Why Exclusive Feature Bundling (EFB)?

Tom

2021年4月23日 12:15

I'm currently studying GBDT and started reading LightGBM's research paper. In section 4. they explain the Exclusive Feature Bundling algorithm, which aims at reducing the number of features by regrouping mutually exclusive features into bundles, treating them as a single feature. The researchers emphasize the fact that one must be able to retrieve the original values of the features from the bundle. Question: If we have a categorical feature that has been one-hot encoded, won't this algorithm simply reverse the …

Topic: machine-learning-model xgboost decision-trees gbm feature-selection

Category: Data Science

Need help understanding xgboost's approximate split points proposal

ihadanny

2021年4月1日 22:33

background: in xgboost the $t$ iteration tries to fit a tree $f_t$ over all $n$ examples which minimizes the following objective: $$\sum_{i=1}^n[g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)]$$ where $g_i, h_i$ are first order and second order derivatives over our previous best estimation $\hat{y}$ (from iteration $t-1$): $g_i=d_{\hat{y}}l(y_i, \hat{y}) $ $h_i=d^2_{\hat{y}}l(y_i, \hat{y}) $ and $l$ is our loss function. The question (finally): When building $f_t$ and considering a specific feature $k$ in a specific split, they use the following heuristic to assess only some …

Topic: xgboost gbm

Category: Data Science

Fit Decision Tree to Gradient Boosted Trees for Interpretability

jtanman

2021年3月23日 13:25

I was wondering if there is literature on or someone could explain how to fit a decision tree to a gradient boosted trees classifier in order to derive more interpretable results. This is apparently the approach that Turi uses in their explain function which outputs something like this: Turi's explain function: from their page here. I know that for random forests you can average the contribution of each feature in every tree as seen in the TreeInterpreter python package, but …

Topic: decision-trees gbm classification python

Category: Data Science

Which loss functions does h2o.gbm use by default?

user111690

2021年2月10日 23:03

the GBM implementation of the h2o package only allows the user to specify a loss function via the distribution argument, which defaults to multinomial for categorical response variables and gaussian for numerical response variables. According to the documentation, the loss functions are implied by the distributions. But I need to know which loss functions are used, and I can't find that anywhere in the documentation. I'm guessing it's the MSE for gaussian and cross-entropy for multinomial - does anybody here …

Topic: h2o loss-function gbm

Category: Data Science

How to reconstruct a scikit-learn predictor for Gradient Boosting Regressor?

Chong Lip Phang

2020年12月27日 01:25

I would like to train my datasets in scikit-learn but export the final Gradient Boosting Regressor elsewhere so that I can make predictions directly on another platform. I am aware that we can obtain the individual decision trees used by the regressor by accessing regressor.estimators[].tree_. What I would like to know is how to fit these decision trees together to make the final regression predictor.

Topic: natural-gradient-boosting representation prediction gbm scikit-learn

Category: Data Science

What quantile is used for the initial DummyRegressor for Gradient Boosting Regressor in scikit-learn?

Chong Lip Phang

2020年12月26日 16:48

According to the documentation of Scikit-Learn Gradient Boosting Regressor: init: estimator or ‘zero’, default=None: An estimator object that is used to compute the initial predictions. init has to provide fit and predict. If ‘zero’, the initial raw predictions are set to zero. By default a DummyEstimator is used, predicting either the average target value (for loss=’ls’), or a quantile for the other losses. So what quantile is used for the DummyRegressor if the loss function is 'huber'? Is it the …

Topic: loss-function gbm regression scikit-learn

Category: Data Science

Does Gradient Boosting perform n-ary splits where n > 2?

Chong Lip Phang

2020年12月18日 15:27

I wonder whether algorithms such as GBM, XGBoost, CatBoost, and LightGBM perform more than two splits at a node in the decision trees? Can a node be split into 3 or more branches instead of merely binary splits? Can more than one feature be used in deciding how to split a node? Can a feature be re-used in splitting a descendant node?

Topic: natural-gradient-boosting catboost lightgbm xgboost gbm

Category: Data Science

About