bagging

Random LightGBM Forest

CutePoison

2022年4月13日 14:06

I'm not completly sure about the bias/variance of boosted decision trees (LightGBM especially), thus I wonder if we generally would expect a performance boost by creating an ensemble of multiple LightGBM models, just like with Random Forest?

Topic: bagging gradient-boosting-decision-trees lightgbm ensemble-modeling

Category: Data Science

Multiclass classification oob error

Bobslope

2021年7月4日 13:35

Im implementing a random forrest for a 6 class classification and witnessing a strange phenomenon. I have 10 percent of my set sectioned out as a pseudo validation set. Im training 50 percent of the training items (training items being 90 percent of the whole set) per tree randomly selected. Now my oob error is almost the mirror image of my validation error. Im using averaged f1 error (ie average of the f1 error per class). As more trees are …

Topic: bagging generalization multiclass-classification error-handling random-forest

Category: Data Science

xgboost performance

danny

2021年6月25日 01:44

XGBoostRegressor is not performing better than AdaBoostRegressor for the same set of parameters for some reason. Since my dataset is big, I made an example using sklearn's make_regression as follows. from sklearn.ensemble import AdaBoostRegressor from sklearn.datasets import make_regression from sklearn.tree import DecisionTreeRegressor from sklearn.linear_model import LinearRegression from xgboost import XGBRegressor X, y = make_regression(n_samples=10000, n_features=1, n_informative=1, random_state=0, noise=1,shuffle=False) regr = LinearRegression() regr.fit(X, y) print regr.score(X, y) regr = AdaBoostRegressor(DecisionTreeRegressor(max_depth=6),n_estimators=10,random_state=0) regr.fit(X, y) print regr.score(X, y) regr = XGBRegressor(max_depth=6,n_estimators=10,random_state=0) regr.fit(X, y) print …

Topic: bagging boosting xgboost

Category: Data Science

Why can't we sample without replacement for each tree in a random forest if the subsample size is large enough?

user9343456

2021年5月23日 18:41

Usually if we have $n$ observations, for each tree with form a bootstrapped subsample of size $n$ with replacement. On googling it one common explanation I've seen is that with replacement sampling is necessary for independence of individual trees. But why can't we just resample as follows: for tree 1, randomly sample $m$ observations without replacement out of the $n$, where $m$ is still large enough (of course, provided that $n$ is large enough in the first place). Then replenish …

Topic: bagging random-forest

Category: Data Science

Can Boosting and Bagging be applied to heterogeneous algorithms?

Ahmad Bilal

2021年4月11日 18:12

Stacking can be achieved with heterogeneous algorithms such as RF, SVM and KNN. However, can such heterogeneously be achieved in Bagging or Boosting? For example, in Boosting, instead of using RF in all the iterations, could we use different algorithms?

Topic: bagging data-science-model boosting stacking

Category: Data Science

Difference between bagging and pasting?

good_evening

2021年4月6日 08:13

I found the definition: Bagging is to use the same training for every predictor, but to train them on different random subsets of the training set. When sampling is performed with replacement, this method is called bagging (short for bootstrap aggregating). When sampling is performed without replacement, it is called pasting. What is "replacement" in this context?

Topic: pasting bagging machine-learning

Category: Data Science

Base model in ensemble learning

haneulkim

2021年2月12日 08:03

I've been doing some research on ensemble learning and read that for base models, model with high variance are often recommended (can't remember which book I read this from exactly). But, it seems counter-intuitive because wouldn't having base models with low variance(doing good on test set) be better than having multiple bad base models?

Topic: bagging ensemble-modeling machine-learning

Category: Data Science

Bagging vs pasting in ensemble learning

chekhovana

2021年1月10日 17:00

This is a citation from "Hands-on machine learning with Scikit-Learn, Keras and TensorFlow" by Aurelien Geron: "Bootstrapping introduces a bit more diversity in the subsets that each predictor is trained on, so bagging ends up with a slightly higher bias than pasting, but this also means that predictors end up being less correlated so the ensemble’s variance is reduced." I can't understand why bagging, as compared to pasting, results in higher bias and lower variance. Can anyone provide an intuitive …

Topic: bagging bias ensemble variance machine-learning

Category: Data Science

bagging vs. pasting in ensemble learning

haneulkim

2021年1月10日 11:30

I am bit confused about two concepts. From my understanding Bagging is when each data is replaced after each choice. so for example for each subset of data you pick one from population, replace it then pick one again, etc... and this is repeated for each subset of data. But for pasting people say it is sampling without replacement however does that mean you can't have same data on any subset? I thought it picks one subset w/o replacement but …

Topic: pasting bagging machine-learning

Category: Data Science

Bagging Base models

Aman Oswal

2020年11月16日 21:53

If bagging reduces overfitting than the general statement that base learners of ensemble models should have high bias and low variance(that is should be undefiting) wrong?

Topic: bagging

Category: Data Science

Why the accuracy of my bagging model heavily affected by random state?

Farahaina Idris

2020年11月2日 23:01

The accuracy of my bagging decision tree model reach up to 97% when I set the random seed=5 but the accuracy reduce to only 92% when I set random seed=0. Can someone explain why the huge gap and should I just use the accuracy with highest value in my research paper or takes the average with random seed=None?

Topic: bagging random-forest classification machine-learning

Category: Data Science

Counting the number of trainable parameters in a gradient boosted tree

AIM_BLB

2020年10月14日 06:17

I recently ran the gradient boosted tree regressor using scikit-learn via: GradientBoostingRegressor() This model depends on the following hyperparameters: Estimators ($N_1$) Min Samples Leaf ($N_2$) Max Depth ($N_3$) which in-turn determine the number of trainable parameters in this model. My question is, how can I count the number of parameters (trainable or otherwise randomly assigned) which determined the final model as a function of the above? My guess is $N_1 \times N_2 \times N_3$ but is this correct?

Topic: bagging boosting xgboost decision-trees

Category: Data Science

Can I do bagging method as improvement technique to decision tree in research?

Farahaina Idris

2020年9月21日 14:05

Bagging use decision tree as base classifier. I want to use bagging with decision tree(c4.5) as base as the method that improve decision tree(c4.5) in my research that solve problem overfitting. Is that possible because some lecturers said not right as bagging is other classifier not hybrid between two?

Topic: bagging decision-trees random-forest classification

Category: Data Science

Difference Bagging and Bootstrap aggregating

martin

2020年7月12日 20:46

Bootstrap belongs to Efron. Tibshirani wrote a book about that in reference to Efron. Bootstrap process for estimating the standard error of statistic s(x). B bootstrap sample are generatied from original data. Finally the standard deviation of the values s(x1),s(x2)..s(xB) is our estimate of the standard error of s(x). The bootstrap estimate of standard error is the standard deviation of bootstrap replications. Typical value for B, number of bootstrap samples range from 50 to 200 for stand.error estimation Breiman wrote …

Topic: bootstraping bagging aggregation

Category: Data Science

How does bagging help reduce the variance

Bhuwan Bhatt

2020年6月10日 10:03

I learned that bagging helps reduce variance by averaging but I couldn't understand this. Can someone explain this intuitively?

Topic: bagging ensemble-modeling decision-trees

Category: Data Science

Can bagging ensemble consist of heterogeneous base models?

develarist

2020年4月14日 23:00

Bagging or bootstrap aggregation seems to make sense for time series forecasting using an ensemble because bagging randomizes subsets of the data with replacement. However, I've only seen bagging used for homogeneous base learners when constructing ensembles. Stacking is another ensemble technique that uses heterogeneous base learners, but stacking employs cross-validation, which I don't view as being appropriate for economic time series forecasting, even if time series split cross-validation that retains the ordering of observations is used. As you can …

Topic: bagging ensemble-modeling machine-learning

Category: Data Science

Random Forest Stacking Experiment for Imbalanced Data-set Problem

Aman Raparia

2020年4月14日 00:07

In order to solve a Imbalanced Dataset Problem, I experimented with Random Forest in the given manner (Somewhat inspired by Deep-Learning) Trained a Random Forest which will take in the input data and the predict probability of the label of the trained model will be used as a input to train another Random Forest. Pseudo Code for this : train_X, test_X, train_y, test_y = train_test_split(X,y, test_size = 0.2) rf_model = RandomForestClassifier() rf_model.fit(train_X, train_y) pred = rf_model.predict(test_X) print('******************RANDOM FOREST CM*******************************') print(confusion_matrix(test_y, …

Topic: bagging boosting random-forest scikit-learn

Category: Data Science

Boosting with highly correlated features

Peter

2020年3月29日 22:13

I have a conceptual question. My understanding is, that Random Forest can be applied even when features are (highly) correlated. This is because with bagging, the influence of few highly correlated features is moderated, since each feature only occurs in some of the trees which are finally used to build the overall model. My question: With boosting, usually even smaller trees (basically "stunps") are used. Is it a problem to have many (highly) correlated features in a bagging approach?

Topic: bagging boosting random-forest

Category: Data Science

About