How to use a set of pre-defined classifiers in Adaboost?

Suppose there are some classifiers as follows: dt = DecisionTreeClassifier(max_depth=DT_max_depth, random_state=0) rf = RandomForestClassifier(n_estimators=RF_n_est, random_state=0) xgb = XGBClassifier(n_estimators=XGB_n_est, random_state=0) knn = KNeighborsClassifier(n_neighbors=KNN_n_neigh) svm1 = svm.SVC(kernel='linear') svn2 = svm.SVC(kernel='rbf') lr = LogisticRegression(random_state=0,penalty = LR_n_est, solver= 'saga') In AdaBoost, I can define a base_estimator and also the number of estimators. However, I want to use these 7 classifiers. In other words, n_estimators=7 and these estimators are above ones. How can I define this model?
Category: Data Science

Scikit-learn's implementation of AdaBoost

I am trying to implement the AdaBoost algorithm in pure Python (or using NumPy if necessary). I loop over all weak classifiers (in this case decision stumps), then over all features, and then over all possible values of the feature to see which one divides the dataset better. This is my code: for _ in range(self.n_classifiers): classifier = BaseClassifier() min_error = np.inf # greedy search to find best threshold and feature for feature_i in range(n_features): thresholds = np.unique(X[:, feature_i]) for …
Category: Data Science

Does hyperparameter tuning of Decision Tree then use it in Adaboost individually vs Simultaneously yield the same results?

So, my predicament here is as follows, I performed hyperparameter tuning on a standalone Decision Tree classifier, and I got the best results, now comes the turn of Standalone Adaboost, but here is where my problem lies, if I use the Tuned Decision Tree from earlier as a base_estimator in Adaboost, then I perform hyperparameter tuning on Adaboost only, will it yield the same results as trying to perform hyperparameter tuning on untuned Adaboost and untuned Decision Tree as a …
Category: Data Science

Train and test data fixed during boosting?

I have question about boosting algorithm. I know that boosting is a sequential process and it gives high weight to misclassification of previous model. Then, its' train and test data are fixed through this sequential process? Is it predicting data used for training to determine if it is misclassification, and then giving a larger weight to training the model? Thanks in advance discussion
Category: Data Science

Why does classifier (XGBoost) “after PCA” runtime increase compared to “before PCA”

The short version: I am trying to compare different classifiers for a certain dataset from kaggle, and am trying to also compare these classifiers between before using PCA (form sklearn) to after using PCA in terms of accuracy and runtime. For some reason the runtime of the classifiers (XGBoost and AdaBoost to take 2 as an example) after the use of PCA is 3 times (approximately) the runtime of the classifiers before the use of PCA. My question is: why? …
Category: Data Science

Evaluating optimal values for depth of tree

I'm studying the performance of an AdaBoost model and I wonder how it performs in regard to the depth of the trees. Here's the accuracy for the model with a depth of 1 and here with a depth of 3 From my point of view, I would say the lower one looks better but somehow I guess the upper one is better as the training accuracy doesn't vanish (overfitting?)? The question resp. answer from Hyperparameter tunning for Random Forest- choose …
Category: Data Science

Forecasting: Multiple Linear Regression (OLS) outperforms Random Forests / Gradient Boosting / AdaBoost

I'm using different forecasting methods on a dataset to try and compare the accuracy of these methods. For some reason, multiple linear regression (OLS) is outperforming RF, GB and AdaBoost when comparing MAE, RMSE R^2 and MAPE. This is very surprising to me. Is there any general reason that could explain this outperformance? I know that ML methods don't perform well with datasets that have a small amount of samples, but this should not be the case here. I'm a …
Category: Data Science

AdaBoost implementation and tuning for high dimensional feature space in R

I am trying to implement the AdaBoost.M1 algorithm (trees as base-learners) to a data set with a large feature space (~ 20.000 features) and ~ 100 samples in R. There exists a variety of different packages for this purpose; AdaBag, Ada and gbm. gbm() (from the gbm-package) appears to be my only available option, as stack.overflow is a problem in the others, and though it works, it is very time-consuming. Questions: Is there any way to overcome the stack.overflow the …
Category: Data Science

Is the way to combine weak learners in AdaBoost for regression arbitrary?

I'm reading about how variants of boosting combine weak learners into final predication. The case I'm consider is regression. In paper Improving Regressors using Boosting Techniques, the final prediction is the weighted median. For a particular input $x_{i},$ each of the $\mathrm{T}$ machines makes a prediction $h_{t}, t=1, \ldots, T .$ Obtain the cumulative prediction $h_{f}$ using the T predictors: $$h_{f}=\inf\left\{y \in Y: \sum_{t: h_{t} \leq y} \log \left(1 / \beta_{t}\right) \geq \frac{1}{2} \sum_{t} \log \left(1 / \beta_{t}\right)\right\}$$ This is …
Category: Data Science

Why Adaboost SAMME needs f to be estimable?

I am trying to understand the mathematics behind SAMME AdaBoost: At some stage, the paper adds a constraint for f to be estimable: I do not understand why this is required. Can someone explain a bit better why this restriction is needed? As well, would be possible to use a different constraint than the one added in the paper that would make f estimable?
Category: Data Science

How Adaboost calculates error for each weak learner in training?

I am studying the Adaboost classification algorithm because i would like to implement it from scratch. I understand how it works, but i am not able to understand where some steps are placed. I will describe the Adaboost training steps in my understanding (sorry for any incorrect formalism): Initialize a weak learner $k$ Define a weight for each sample in the dataset equally $w =\frac{1}{N}$ Fit $k$ to the dataset Calculate error $e = \sum_{i=0}^{N}e_iw_i$ Calculate importance $\alpha$ of $k$, …
Category: Data Science

Adaboost with other classifier fitting

There is the opportunity to fit decision trees with other decision trees. For example: adaclassification= AdaBoostClassifier(RandomForestClassifier(n_jobs=-1)) adaclassification.fit(X_train,y_train) I got better results with random forest, so improved the result from adaboost with the random forest classifier. However I dont understand what´s happening here? It sounds easy: adaboost uses a random forest to fit it´s classification. But what´s mathematically going on here? Adaboost is made of the residuals as a sequence (boosting). Random forest (bagging) built a forest out of trees.
Category: Data Science

Does gradient boosting algorithm error always decrease faster and lower on training data?

I am building another XGBoost model and I'm really trying not to overfit the data. I split my data into train and test set and fit the model with early stopping based on the test-set error which results in the following loss plot: I'd say this is pretty standard plot with boosting algorithms as XGBoost. My reasoning is that my point of interest is mostly the performance on the test set and until the XGBoost stopped training around 600th epoch …
Category: Data Science

AdaBoost.R2 learning rate from scikit learn

AdaBoost.R2 (regression), is presented in the paper "improving regressors with boosting techniques" from Drucker and is freely available on Scholar. The implementation of regression for AdaBoost in scikit learn uses this algorithm (paper is cited in the sources of the AdaboostRegressor class). The thing is that there is a step fundamentaly different from the original version of Drucker. There is the introduction of a new parameter named 'learning rate' for the AdaBoost algorithm. I will use $\eta$ as notation for …
Category: Data Science

Explanation on some steps of AdaBoost.R2

I am trying to understand AdaBoost.R2 in order to implement it and apply it to a regression problem. In this circumstances I need to understand it perfectly, however there's some step i don't really get. The paper is available here, and Adaboost.R2 is presented in section 3: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.31.314&rep=rep1&type=pdf In step 4, $\operatorname{sup}|.|$ is used; I've never seen that notation, what does it mean exactly? In step 7, "** means exponentation", in that case that would mean $w_i\beta *\operatorname{exp}([1-L_i])$, right?
Category: Data Science

Understanding additive function approximation or Understanding matching pursuit

I am trying to read Greedy function approximation: A gradient boosting machine. On page 4 (it is marked as page 1192) under 3. Finite data the author tells how the function approximation approach breaks down when we have finite data and some way to impose smoothness is needed to get a function that can be used at points other than the ones provided in the training dataset. One way it suggests is to use parametric base functions (like in neural …
Category: Data Science

AdaBoost decision_function() outputs in binary classification with sklearn

As I understand it based on some study of the source code, I would expect, when using AdaBoost, that values obtained by calling decision_function() would be bounded between -1 and 1. This is because it's the weighed average of the probabilities. However, as you can see in the histogram below, the values seem to range from a little under -2 to a little over +2. Why is this? Am I under some misunderstanding about how these values are calculated?
Category: Data Science

Every machine learing model i build, always predict wrongly almost the same samples. (Random forest, XGBoost, AdaBoost)

First of all, I'd like to apologize for any spelling or grammar mistakes. I'm having a problem using R for a classification problem. My dataset contains ~300.000 genomic data, and the features are DNA-related features (number of dinucleotides, number of trinucleotides, the CG Content, and some more). In conclusion, I have a dataset of 300.000 rows and 84 columns (columns = features). The 84th feature is basically the classification variable (there are two classes: class 1 and class 2). I …
Category: Data Science

Formula to calculate confidence value in Adaboost

I am coding an AdaBoostClassifier with the two class variant of SAMME algorithm. Here is the code. def I(flag): return 1 if flag else 0 def sign(x): return abs(x)/x if x!=0 else 1 AdaBoost Class class AdaBoost: def __init__(self,n_estimators=50): self.n_estimators = n_estimators self.models = [None]*n_estimators def fit(self,X,y): X = np.float64(X) N = len(y) w = np.array([1/N for i in range(N)]) for m in range(self.n_estimators): Gm = DecisionTreeClassifier(max_depth=1)\ .fit(X,y,sample_weight=w).predict errM = sum([w[i]*I(y[i]!=Gm(X[i].reshape(1,-1))) \ for i in range(N)])/sum(w) '''Confidence Value''' #BetaM = …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.