boosting

Does lightGBM handle multicollinearity?

As13

2022年6月2日 09:22

I have a dataset after feature selection of around 6500 features and 10,000 data rows. I am using LightGBM model. I want to know if I should check the feature set for multicollinearity. If two or more features are correlated how does it affect the tree building and classification prediction How does LightGBM deal with multicollinearity? Does it have any adverse effects?

Topic: lightgbm boosting correlation machine-learning

Category: Data Science

Methods for augmenting binary datasets

S_S

2022年6月1日 21:04

I have a small (~100 samples) dataset with roughly 20 features which are mostly binary, and a few are numeric (~5). I wanted to use methods for augmenting the training set and see if I can get better test accuracy. What methods/code can I use for augmenting binary datasets?

Topic: boosting data-augmentation smote

Category: Data Science

Updating weights in Adaboost

slowmonk

2022年4月21日 00:02

I'm studying the Adaboost algorithm. This algorithm updates the weight after training. This is the table when they explain about weight on Adaboost I'm confused about what this "weight" means. Is it: weight for each node? weight for each model? table weight column (more confusing)?

Topic: boosting

Category: Data Science

Example for Boosting

Chris

2022年4月10日 15:00

Can someone exactly tell me how does boosting as implemented by LightGBM or XGBoost work in real case scenerio. Like I know it splits tree leaf wise instead of level wise, which will contribute to global average not just the loss of branch which will help it learn lower error rate faster than level wise tree. But I cannot understand completely until I see some real example, I have tried to look at so many articles and videos but everywhere …

Topic: gradient-boosting-decision-trees lightgbm boosting xgboost machine-learning

Category: Data Science

Adaboost - Show that adjusting weights brings error of current iteration to 0.5

Saad Hussain

2022年4月5日 04:04

I'm trying to solve the following problem but I've gotten sort of stuck. So for adaboost, $err_t = \frac{\sum_{i=1}^{N}w_i \Pi (h_t(x^{(i)}) \neq t^{(i)})}{\sum_{i=1}^{N}w_i}$ and $\alpha_t = \frac{1}{2}ln(\frac{1-err_t}{err_t})$ Weights for the next iteration are $w_i' = w_i exp(-\alpha_t t^{(i)} h_t(x^{(i)}))$ and this assumes $t$ and $h_t$ takes on a value of either $-1$ or $+1$. I have to show that the error with respect to the new weights $w_i'$ is $\frac{1}{2}$. i.e., $err_t' = \frac{\sum_{i=1}^{N}w_i' \Pi (h_t(x^{(i)}) \neq t^{(i)})}{\sum_{i=1}^{N}w_i'} = \frac{1}{2}$ …

Topic: boosting algorithms machine-learning

Category: Data Science

How to ensure same encoding pattern?

Dishant Kothia

2022年3月27日 00:03

I created a XGBRegressor model with certain encoded 'object' dtypes in the data. Now if I want to run the model with new set of data which is freshly encoded it's giving wrong predictions. How to ensure that the new dataset is encoded in the same way as was the train data? Or any other solution to this problem?

Topic: boosting xgboost regression

Category: Data Science

How to improve model performace when model shows a systemic pattern in residues

PPR

2022年3月23日 18:29

I'm working on a regression model using Boosting algorithms (CatBoost, XGBoost, and LightGBM). All models give similar accuracy of 0.2 RMSE (Target varies from 0 to 1). I obtained the following plots when I plotted residues. My model is overpredicting for small target value (near zero) and underpredicting for large target value (near 1). How can I improve my model performance? The model is not overfitting and I'm doing an exhaustive hyperparameter search and basic feature engineering. I'm trying to …

Topic: data-science-model boosting data ensemble-modeling

Category: Data Science

How is a single classifier fitted on AdaBoost?

Davi Américo

2022年2月19日 01:05

The AdaBoost algorithm is: My trouble is how the classifier $G_m(x)$ is trained, What does mean a classifier to be trained using weights $w_i$? Is it to fit classifier through $\{w_i,y_i\}_{i=1}^{N}$?

Topic: adaboost boosting classification

Category: Data Science

Understanding Weighted learning in Ensemble Classifiers

AnonymousMe

2022年2月6日 00:04

I'm currently studying Boosting techniques in Machine Learning and I happened to understand that in Algorithms like Adaboost, each of the training samples is given a weight depending on whether it was misclassified or not by the previous model in sequential boosting. Although I intuitively understand that by weighting examples, we are letting the model pay more attention to examples that were previously misclassified, I do not understand "how" the weights are taken into account by a machine learning algorithm. …

Topic: boosting training weighted-data ensemble-modeling machine-learning

Category: Data Science

how does XGBoost's exact greedy split finding algorithm determine candidate split values for different feature types?

tvl

2022年2月5日 08:08

Based on the paper by Chen & Guestrin (2016) "XGBoost: A Scalable Tree Boosting System", XGBoost's "exact split finding algorithm enumerates over all the possible splits on all the features to find the best split" (page 3). Thus, my understanding was that XGBoost enumerates over all features and uses unique values of each feature as candidate split points to then choose the split value that maximizes the splitting criterion (gain). Then, my question is why do the split values for …

Topic: boosting xgboost decision-trees machine-learning

Category: Data Science

How to make LightGBM to suppress output?

Peter

2022年1月24日 11:49

I have tried for a while to figure out how to "shut up" LightGBM. Especially, I would like to suppress the output of LightGBM during training (i.e. feedback on the boosting steps). My model: params = { 'objective': 'regression', 'learning_rate' :0.9, 'max_depth' : 1, 'metric': 'mean_squared_error', 'seed': 7, 'boosting_type' : 'gbdt' } gbm = lgb.train(params, lgb_train, num_boost_round=100000, valid_sets=lgb_eval, early_stopping_rounds=100) I tried to add verbose=0 as suggested in the docs, but this does not work. https://github.com/microsoft/LightGBM/blob/master/docs/Parameters.rst Does anyone know how to …

Topic: lightgbm boosting python

Category: Data Science

If There is a case where decision trees are getting overfitted so by using gradient boost method do we solve that problem?

raghav gaur

2022年1月16日 11:03

I have came across a case where my decision trees are getting overfitting so by using methods like gradient boost can I solve that problem.

Topic: ensemble-learning boosting machine-learning

Category: Data Science

Question from a paper: I do not understand why it is stated that SGD employs the bootstrapping to calculate gradient?

Mas A

2022年1月9日 19:31

In this paper, they state that: As SGD employs the bootstrapping (i.e., random sampling with replacement) [67] for gradient calculation, we can obtain the unbiased estimation of standard gradients calculated by all the data, i.e., E[∇fit (wt)] ← ∇f (wt). As far as I know, while using bootstrapping, suppose that we have a data set of ABCD, we create multiple datasets for that initial one. For example we create AABD, DCDA, BAAD, CBAA, AAAB, etc. However, in SGD, we first …

Topic: boosting gradient-descent deep-learning

Category: Data Science

If a feature has already split, will it hardly be selected to split again in the subsequent tree in a Gradient Boosting Tree

user6703592

2021年11月7日 18:16

I have asked this question here, but seems no one was interested in it: https://stats.stackexchange.com/questions/550994/if-a-feature-has-already-split-will-it-hardly-be-selected-to-split-again-in-the If a feature has already split, will it hardly be selected to split again in the subsequent tree in a Gradient Boosting Tree? It is motivated by the fact that for the heavy correlated features in a single tree, usually only one of them will be selected to split as their uncertainty will remain few after a splitting. Now in Gradient Boosting Tree, is residual …

Topic: gradient-boosting-decision-trees boosting decision-trees random-forest machine-learning

Category: Data Science

How does XGBoost perform in Parallel

Chris

2021年10月19日 04:33

So what I know about boosting technique, Like we train the data and update the weights of falsely predicted values or try to minimize the loss in the next model. So basically, it's the sequential process where we feed the output of one model to another. In XGBoost it's said that model performs parallelly by Data parallelization or Model parallelization, so I'm not able to understand that if that's the case then how are we feeding out of first model …

Topic: boosting xgboost ensemble-modeling python machine-learning

Category: Data Science

XGB Regression: Is there a way to handle somewhat bimodal Y variable?

Pete

2021年10月12日 00:55

I am using XGBRegression to predict on continuous percentage data with 80% of the values around 100, 10% around 0 and 10% data distributed in the middle. Models are struggling with predictions around 0, due to the nature of the data. I tried oversampling, but it doesn't improve results much. Any suggestions of things to try?

Topic: boosting xgboost regression

Category: Data Science

Why does Catboost outperform other boosting algorithms?

Aastha Jha

2021年10月7日 04:43

I have noticed while working with multiple datasets that catboost with its default parameters tends to outperform lightgbm or xgboost with its default parameters even on a tabular dataset with no categorical features. I am assuming this has something to do with the way catboost constructs the decision trees but I just wanted to confirm this theory. If anyone could elaborate on why it performs better on non categorical data then that would be great! Thanks in advance!

Topic: catboost lightgbm boosting decision-trees python

Category: Data Science

Boosting algorithms only built with decision trees? why?

haneulkim

2021年10月6日 13:00

My understanding of boosting is just training models sequentially and learning from its previous mistakes. Can boosting algorithms be built with bunch of logistic regression? or logistic regression + decision trees? If yes, I would like to know some papers or books that covers this topic in-depth.

Topic: boosting ensemble-modeling machine-learning

Category: Data Science

Gradient boosting algorithms and filling categorical variables

Vasilii Naumushkin

2021年10月2日 18:39

I have house prices dataset Link on Kaggle and I am having some dilemma. Some categorical variables having explicit majority. If we look at MSZoning and SaleType columns, there is "RL" type meets 91% of values for MSZoning and "WD" meets 87% of time for SaleType respectively. Before I apply ecnodings or labelings, I need to decide, fill missing values with None or fill them with mode. In other words, if we pretty certain that some type of data will …

Topic: boosting missing-data xgboost

Category: Data Science

Train and test data fixed during boosting?

hjs

2021年8月15日 07:27

I have question about boosting algorithm. I know that boosting is a sequential process and it gives high weight to misclassification of previous model. Then, its' train and test data are fixed through this sequential process? Is it predicting data used for training to determine if it is misclassification, and then giving a larger weight to training the model? Thanks in advance discussion

Topic: adaboost boosting xgboost algorithms machine-learning

Category: Data Science

About