Why does Light GBM model produce different results while testing?

Using the Light GBM regressor, I have trained my data and, using Grid Search, I got the best parameters, but while testing with the best parameters I am getting different results each time, which means the model produces different results for each test iteration. I ran the lightgbm twice with the same parameters, but got different results in validation. I found the only random seed parameter to be baggingSeed. After fixing baggingSeed, the problem also occurred. Should I fix any …
Category: Data Science

Why is each successive tree in GBM fit on the negative gradient of the loss function?

Page 359 of Elements Of Statistical Learning 2nd edition says the below. Can someone explain the intuition & simplify it in layman terms? Questions What is the reason/intuition & math behind fitting each successive tree in GBM on the negative gradient of the loss function? Is it done to make GBM more generalization on unseen test dataset? If so how does fitting on negative gradient achieve this generalization on test data?
Category: Data Science

Analysis of prediction shift problem in gradient boosting

I was going through the Catboost paper section 4.1 where they talk about the 'Analysis of prediction shift' using an example consisting of 2 features which are bernoulli random variables. I am unable to wrap my head around the experimental setup. Since there are only 2 indicator features, so we can have only 4 data points, everything else will be duplication. They mention that for train data points the output of the first estimator of the boosting model is biased, …
Category: Data Science

GBM: small change in the trainset causes radical change in predictions

I have build a model using transactions data trying to predict the value of future transactions. The main algorithm is Gradient Boosting Machine. The overall accuracy on the testset is fine and there is no sign of overfitting. However, a small change in the training set creates radical change in the model, and in the predictions. But even when the testset change a little the overall accuracy is stable. The time period is from 2005 to today and when a …
Category: Data Science

Is this random forest logical correct and correct implemented with R and gbm?

For professional reasons I want to learn and understand random forests. I feel unsafe if my understanding is the correct or if I am doing logical errors. I got a data set with 15 million entries and want to make a regression for a numerical target (time). The data structure is: I have 7 categorical variables, 1 date and 4 numerical features. After data preparation I split the data into training and test data set. Than I defined a gradient …
Category: Data Science

aggregation of feature importance

I have more of a conceptual question I was hoping to get some feedback on. I am trying to run a boosted regression ML model to identify a subset of important predictors for some clinical condition. The dataset includes over 100000 rows, and close to 1000 predictors. Now, the etiology of the disease we are trying to predict is largely unknown. Thus, we likely don’t have data on many important predictors for the condition. That is to say, as a …
Category: Data Science

Xgboost quantile regression via custom objective

I am new to GBM and xgboost, and am currently using xgboost_0.6-2 in R. The modeling runs well with the standard objective function "objective" = "reg:linear" and after reading this NIH paper I wanted to run a quantile regression using a custom objective function, but it iterates exactly 11 times and the metric does not change. I just simply switched out the 'pred' statement following the GitHub xgboost demo, but am afraid it is more complicated than that and I …
Category: Data Science

Loss function in GradientBoostingRegressor

Scikit Learn GradientBoostingRegressor: I was looking at the scikit-Learn documentation for GradientBoostingRegressor. Here it says that we can use 'ls' as a loss function which is least squares regression. But I am confused since least squares regression is a method to minimize the SSE loss function. So shouldn't they mention SSE here?
Category: Data Science

What is init_score in lightGBM?

In the tutorial boosting from existing prediction in lightGBM R, there is a init_score parameter in function setinfo. I am wondering what init_score means? In the help page, it says: init_score: initial score is the base prediction lightgbm will boost from Another thing is what does "boost" mean in lightGBM?
Topic: self-study gbm
Category: Data Science

How fit pairwise ranking models in XGBoost?

As far as I know, to train learning to rank models, you need to have three things in the dataset: label or relevance group or query id feature vector For example, the Microsoft Learning to Rank dataset uses this format (label, group id, and features). 1 qid:10 1:0.031310 2:0.666667 ... 0 qid:10 1:0.078682 2:0.166667 ... I am trying out XGBoost that utilizes GBMs to do pairwise ranking. They have an example for a ranking task that uses the C++ program …
Category: Data Science

Does XGBoost handle multicollinearity by itself?

I'm currently using XGBoost on a data-set with 21 features (selected from list of some 150 features), then one-hot coded them to obtain ~98 features. A few of these 98 features are somewhat redundant, for example: a variable (feature) $A$ also appears as $\frac{B}{A}$ and $\frac{C}{A}$. My questions are : How (If?) do Boosted Decision Trees handle multicollinearity? How would the existence of multicollinearity affect prediction if it is not handled? From what I understand, the model is learning more …
Category: Data Science

Random Forest but keep only leaves with impurities below a threshold

Is there an algorithm out there that creates a random forest but then prunes all the leaves that have an impurity measure above a certain threshold that I would determine? In other words, if I set min samples per leaf to be 500 and leaves have to have at least a 90% purity for example, the algorithm would only keep leaves that respect these parameters. My dataset is extremely noisy so most leaves have a gini impurity around 0.5 but …
Category: Data Science

LightGBM - Why Exclusive Feature Bundling (EFB)?

I'm currently studying GBDT and started reading LightGBM's research paper. In section 4. they explain the Exclusive Feature Bundling algorithm, which aims at reducing the number of features by regrouping mutually exclusive features into bundles, treating them as a single feature. The researchers emphasize the fact that one must be able to retrieve the original values of the features from the bundle. Question: If we have a categorical feature that has been one-hot encoded, won't this algorithm simply reverse the …
Category: Data Science

Need help understanding xgboost's approximate split points proposal

background: in xgboost the $t$ iteration tries to fit a tree $f_t$ over all $n$ examples which minimizes the following objective: $$\sum_{i=1}^n[g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)]$$ where $g_i, h_i$ are first order and second order derivatives over our previous best estimation $\hat{y}$ (from iteration $t-1$): $g_i=d_{\hat{y}}l(y_i, \hat{y}) $ $h_i=d^2_{\hat{y}}l(y_i, \hat{y}) $ and $l$ is our loss function. The question (finally): When building $f_t$ and considering a specific feature $k$ in a specific split, they use the following heuristic to assess only some …
Topic: xgboost gbm
Category: Data Science

Fit Decision Tree to Gradient Boosted Trees for Interpretability

I was wondering if there is literature on or someone could explain how to fit a decision tree to a gradient boosted trees classifier in order to derive more interpretable results. This is apparently the approach that Turi uses in their explain function which outputs something like this: Turi's explain function: from their page here. I know that for random forests you can average the contribution of each feature in every tree as seen in the TreeInterpreter python package, but …
Category: Data Science

Which loss functions does h2o.gbm use by default?

the GBM implementation of the h2o package only allows the user to specify a loss function via the distribution argument, which defaults to multinomial for categorical response variables and gaussian for numerical response variables. According to the documentation, the loss functions are implied by the distributions. But I need to know which loss functions are used, and I can't find that anywhere in the documentation. I'm guessing it's the MSE for gaussian and cross-entropy for multinomial - does anybody here …
Category: Data Science

How to reconstruct a scikit-learn predictor for Gradient Boosting Regressor?

I would like to train my datasets in scikit-learn but export the final Gradient Boosting Regressor elsewhere so that I can make predictions directly on another platform. I am aware that we can obtain the individual decision trees used by the regressor by accessing regressor.estimators[].tree_. What I would like to know is how to fit these decision trees together to make the final regression predictor.
Category: Data Science

What quantile is used for the initial DummyRegressor for Gradient Boosting Regressor in scikit-learn?

According to the documentation of Scikit-Learn Gradient Boosting Regressor: init: estimator or ‘zero’, default=None: An estimator object that is used to compute the initial predictions. init has to provide fit and predict. If ‘zero’, the initial raw predictions are set to zero. By default a DummyEstimator is used, predicting either the average target value (for loss=’ls’), or a quantile for the other losses. So what quantile is used for the DummyRegressor if the loss function is 'huber'? Is it the …
Category: Data Science

Does Gradient Boosting perform n-ary splits where n > 2?

I wonder whether algorithms such as GBM, XGBoost, CatBoost, and LightGBM perform more than two splits at a node in the decision trees? Can a node be split into 3 or more branches instead of merely binary splits? Can more than one feature be used in deciding how to split a node? Can a feature be re-used in splitting a descendant node?
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.