Does lightGBM handle multicollinearity?

I have a dataset after feature selection of around 6500 features and 10,000 data rows. I am using LightGBM model. I want to know if I should check the feature set for multicollinearity. If two or more features are correlated how does it affect the tree building and classification prediction How does LightGBM deal with multicollinearity? Does it have any adverse effects?
Category: Data Science

Methods for augmenting binary datasets

I have a small (~100 samples) dataset with roughly 20 features which are mostly binary, and a few are numeric (~5). I wanted to use methods for augmenting the training set and see if I can get better test accuracy. What methods/code can I use for augmenting binary datasets?
Category: Data Science

Updating weights in Adaboost

I'm studying the Adaboost algorithm. This algorithm updates the weight after training. This is the table when they explain about weight on Adaboost I'm confused about what this "weight" means. Is it: weight for each node? weight for each model? table weight column (more confusing)?
Topic: boosting
Category: Data Science

Example for Boosting

Can someone exactly tell me how does boosting as implemented by LightGBM or XGBoost work in real case scenerio. Like I know it splits tree leaf wise instead of level wise, which will contribute to global average not just the loss of branch which will help it learn lower error rate faster than level wise tree. But I cannot understand completely until I see some real example, I have tried to look at so many articles and videos but everywhere …
Category: Data Science

Adaboost - Show that adjusting weights brings error of current iteration to 0.5

I'm trying to solve the following problem but I've gotten sort of stuck. So for adaboost, $err_t = \frac{\sum_{i=1}^{N}w_i \Pi (h_t(x^{(i)}) \neq t^{(i)})}{\sum_{i=1}^{N}w_i}$ and $\alpha_t = \frac{1}{2}ln(\frac{1-err_t}{err_t})$ Weights for the next iteration are $w_i' = w_i exp(-\alpha_t t^{(i)} h_t(x^{(i)}))$ and this assumes $t$ and $h_t$ takes on a value of either $-1$ or $+1$. I have to show that the error with respect to the new weights $w_i'$ is $\frac{1}{2}$. i.e., $err_t' = \frac{\sum_{i=1}^{N}w_i' \Pi (h_t(x^{(i)}) \neq t^{(i)})}{\sum_{i=1}^{N}w_i'} = \frac{1}{2}$ …
Category: Data Science

How to ensure same encoding pattern?

I created a XGBRegressor model with certain encoded 'object' dtypes in the data. Now if I want to run the model with new set of data which is freshly encoded it's giving wrong predictions. How to ensure that the new dataset is encoded in the same way as was the train data? Or any other solution to this problem?
Category: Data Science

How to improve model performace when model shows a systemic pattern in residues

I'm working on a regression model using Boosting algorithms (CatBoost, XGBoost, and LightGBM). All models give similar accuracy of 0.2 RMSE (Target varies from 0 to 1). I obtained the following plots when I plotted residues. My model is overpredicting for small target value (near zero) and underpredicting for large target value (near 1). How can I improve my model performance? The model is not overfitting and I'm doing an exhaustive hyperparameter search and basic feature engineering. I'm trying to …
Category: Data Science

Understanding Weighted learning in Ensemble Classifiers

I'm currently studying Boosting techniques in Machine Learning and I happened to understand that in Algorithms like Adaboost, each of the training samples is given a weight depending on whether it was misclassified or not by the previous model in sequential boosting. Although I intuitively understand that by weighting examples, we are letting the model pay more attention to examples that were previously misclassified, I do not understand "how" the weights are taken into account by a machine learning algorithm. …
Category: Data Science

how does XGBoost's exact greedy split finding algorithm determine candidate split values for different feature types?

Based on the paper by Chen & Guestrin (2016) "XGBoost: A Scalable Tree Boosting System", XGBoost's "exact split finding algorithm enumerates over all the possible splits on all the features to find the best split" (page 3). Thus, my understanding was that XGBoost enumerates over all features and uses unique values of each feature as candidate split points to then choose the split value that maximizes the splitting criterion (gain). Then, my question is why do the split values for …
Category: Data Science

How to make LightGBM to suppress output?

I have tried for a while to figure out how to "shut up" LightGBM. Especially, I would like to suppress the output of LightGBM during training (i.e. feedback on the boosting steps). My model: params = { 'objective': 'regression', 'learning_rate' :0.9, 'max_depth' : 1, 'metric': 'mean_squared_error', 'seed': 7, 'boosting_type' : 'gbdt' } gbm = lgb.train(params, lgb_train, num_boost_round=100000, valid_sets=lgb_eval, early_stopping_rounds=100) I tried to add verbose=0 as suggested in the docs, but this does not work. https://github.com/microsoft/LightGBM/blob/master/docs/Parameters.rst Does anyone know how to …
Category: Data Science

Question from a paper: I do not understand why it is stated that SGD employs the bootstrapping to calculate gradient?

In this paper, they state that: As SGD employs the bootstrapping (i.e., random sampling with replacement) [67] for gradient calculation, we can obtain the unbiased estimation of standard gradients calculated by all the data, i.e., E[∇fit (wt)] ← ∇f (wt). As far as I know, while using bootstrapping, suppose that we have a data set of ABCD, we create multiple datasets for that initial one. For example we create AABD, DCDA, BAAD, CBAA, AAAB, etc. However, in SGD, we first …
Category: Data Science

If a feature has already split, will it hardly be selected to split again in the subsequent tree in a Gradient Boosting Tree

I have asked this question here, but seems no one was interested in it: https://stats.stackexchange.com/questions/550994/if-a-feature-has-already-split-will-it-hardly-be-selected-to-split-again-in-the If a feature has already split, will it hardly be selected to split again in the subsequent tree in a Gradient Boosting Tree? It is motivated by the fact that for the heavy correlated features in a single tree, usually only one of them will be selected to split as their uncertainty will remain few after a splitting. Now in Gradient Boosting Tree, is residual …
Category: Data Science

How does XGBoost perform in Parallel

So what I know about boosting technique, Like we train the data and update the weights of falsely predicted values or try to minimize the loss in the next model. So basically, it's the sequential process where we feed the output of one model to another. In XGBoost it's said that model performs parallelly by Data parallelization or Model parallelization, so I'm not able to understand that if that's the case then how are we feeding out of first model …
Category: Data Science

Why does Catboost outperform other boosting algorithms?

I have noticed while working with multiple datasets that catboost with its default parameters tends to outperform lightgbm or xgboost with its default parameters even on a tabular dataset with no categorical features. I am assuming this has something to do with the way catboost constructs the decision trees but I just wanted to confirm this theory. If anyone could elaborate on why it performs better on non categorical data then that would be great! Thanks in advance!
Category: Data Science

Gradient boosting algorithms and filling categorical variables

I have house prices dataset Link on Kaggle and I am having some dilemma. Some categorical variables having explicit majority. If we look at MSZoning and SaleType columns, there is "RL" type meets 91% of values for MSZoning and "WD" meets 87% of time for SaleType respectively. Before I apply ecnodings or labelings, I need to decide, fill missing values with None or fill them with mode. In other words, if we pretty certain that some type of data will …
Category: Data Science

Train and test data fixed during boosting?

I have question about boosting algorithm. I know that boosting is a sequential process and it gives high weight to misclassification of previous model. Then, its' train and test data are fixed through this sequential process? Is it predicting data used for training to determine if it is misclassification, and then giving a larger weight to training the model? Thanks in advance discussion
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.