ensemble-learning

Incorrect multi-variate anomaly detection - Isolation Forest Python

The AG

2022年3月30日 02:04

My data looks like below. it has 333 rows and 2 columns. Clearly the first row is anomaly. ndf: +----+---------+-------------+ | | ROW_CNT | TOT_SALE | +----+---------+-------------+ | 0 | 45 | 1411.27 | +----+---------+-------------+ | 1 | 47754 | 1596200.68 | +----+---------+-------------+ | 2 | 105894 | 3750304.55 | +----+---------+-------------+ | 3 | 372953 | 14368324.86 | +----+---------+-------------+ | 4 | 389915 | 14899302.85 | +----+---------+-------------+ | 5 | 379473 | 14696309.67 | +----+---------+-------------+ | 6 | 388571 | …

Topic: isolation-forest ensemble-learning python-3.x anomaly-detection

Category: Data Science

If There is a case where decision trees are getting overfitted so by using gradient boost method do we solve that problem?

raghav gaur

2022年1月16日 11:03

I have came across a case where my decision trees are getting overfitting so by using methods like gradient boost can I solve that problem.

Topic: ensemble-learning boosting machine-learning

Category: Data Science

Can numerical encoding really replace one-hot encoding?

lostwanderer

2022年1月4日 02:23

I am reading these articles (see below), which advocate the use of numerical encoding rather than one hot encoding for better interpretability of feature importance output from ensemble models. This goes against everything I have learnt - won't Python treat nominal features (like cities, car make/model) as ordinal if I encode them into integers? https://krbnite.github.io/The-Quest-for-Blackbox-Interpretability-Take-1/ https://medium.com/data-design/visiting-categorical-features-and-encoding-in-decision-trees-53400fa65931

Topic: ensemble-learning one-hot-encoding preprocessing python

Category: Data Science

How to apply Stacking cross validation for time-series data?

Manimaran Subramanian

2021年12月15日 07:56

Normally stacking algorithm uses K-fold cross validation technique to predict oof validation that used for level 2 prediction. In case of time-series data (say stock movement prediction), K-fold cross validation can't be used and time-series validation (one suggested on sklearn lib) is suitable to evaluate the model performance. In this case no prediction shall be made on first fold and no training shall be made on last fold. How do we use stacking algorithm cross validation technique for time-series data?

Topic: ensemble-learning cross-validation time-series

Category: Data Science

Feature Selection using Stacking Ensemble?

Mimi

2021年6月16日 14:11

I want to combine some estimators, such as Logistic Regression, Gaussian NB and K-Nearest Neighbors for Features Selection, I tried to use StackingClassifier() estimator to do that, but there is no feature_importances_ attribute for this estimator. Is there any other method to select features combining those classifiers ?? Thank you in advance :)

Topic: ensemble-learning stacking feature-selection

Category: Data Science

How to assign a weight for classifiers when using weighted majority voting?

s_am

2021年4月25日 18:10

I am trying to apply weighted majority voting on an ensemble as a combiner method. I read different papers and articles, however, I am still a bit lost on: How the weighted majority voting works How to assign a weight for every ensemble base classifiers when using weighted majority voting.

Topic: ensemble-learning ensemble ensemble-modeling

Category: Data Science

Why Extra-trees should only be used within ensemble methods?

Abdulrahman Bres

2021年4月5日 19:58

I was reading scikit-learn documentation for Extremely Randomized Trees and I found this warning: Warning: Extra-trees should only be used within ensemble methods. Why is that?

Topic: ensemble-learning ensemble-modeling decision-trees scikit-learn

Category: Data Science

How to apply ensemble clustering method?

IS2057

2021年3月2日 12:10

I need to use ensemble clustering method by using python in my data set. I already applied k-means clustering by using scikit learn library. I also applied different classification method also find ensemble classification method in scikit-learn. Now I am confused is there any library exist in scikit learn for ensemble clustering or how I can apply ensemble clustering method on my data set?

Topic: ensemble-learning scikit-learn python clustering data-mining

Category: Data Science

Is there a closed formula/function for decision trees?

Nyut Nyuka

2021年2月4日 07:27

i've been studying gradient boosting so realize the pure algorithm requires a function F/model to get boosted.What is the explicit F on gradient boosting trees?

Topic: ensemble-learning decision-trees

Category: Data Science

Ensemble of different reservoirs (echo state networks)

Wouter

2021年1月25日 02:57

Suppose I want to do reservoir computing to classify the input to the proper category (e.g. recognizing a handwritten letter). Ideally, after training a single reservoir and testing it, there would be an output vector y with one value close to 1 and the others close to 0. However, this is not the case in practice, and I don't want to make the reservoir bigger at the moment. I was therefore thinking of combining the predictions of a number of …

Topic: ensemble-learning softmax probability neural-network machine-learning

Category: Data Science

Can we use boosting algorithms like Adaboost and gradient boosting with only one classifier

raghav gaur

2020年12月22日 20:06

I have been working on ensemble learning and I came across this doubt that unlike other ensemble learning algorithms like voting classifier a can we only use one classifier with boosting.

Topic: ensemble-learning boosting machine-learning

Category: Data Science

What if the votes for 2 classes are equal in an ensemble learning technique?

preetk

2020年11月17日 02:07

Suppose in ensemble learning technique, if the number of models that predict class 1 is equal to the number of models that predict class 0. Then, which class will be decided as output?

Topic: ensemble-learning ensemble-modeling random-forest machine-learning

Category: Data Science

How to train with cross validation? and which f1 score to choose?

martin

2020年11月1日 18:24

I got similar results in 2 models which consists of similar algorithms. Model 1 with cv=10 has a f1'micro' of 0.941. See code below. Model 2 only train test split (no cv) has f1'micro' 0.953. Now here is my understanding problem. Before I did a Grid-Search to find best hyperparameters. Now I would like to do just a cross validation to train the dataset. Like the red marked in the picture. In the code there is still the Grid Search …

Topic: ensemble-learning ensemble ensemble-modeling cross-validation

Category: Data Science

Are "Gradient Boosting Machines (GBM)" and GBDT exactly the same thing?

CyberPlayerOne

2020年10月22日 13:47

In the category of Gradient Boosting, I find some terms confusing. I'm aware that XGBoost includes some optimization in comparison to conventional Gradient Boosting. But are Gradient Boosting Machines (GBM) and GBDT the same thing? Are they just different names? Apart from GBM/GBDT and XGBoost, are there any other models fall into the category of Gradient Boosting?

Topic: ensemble-learning xgboost ensemble-modeling gbm

Category: Data Science

In XGBoost, how is a leaf index corresponding to the particular leaf node in actual base learner trees?

CyberPlayerOne

2020年9月19日 20:18

I've trained a XGBoost model for regression, where the max depth is 2. # Create the ensemble ensemble_size = 200 ensemble = xgb.XGBRegressor(n_estimators=ensemble_size, n_jobs=4, max_depth=2, learning_rate=0.1, objective='reg:squarederror') ensemble.fit(train_x, train_y) I've plotted the first tree in the ensemble: # Plot single tree plot_tree(ensemble, rankdir='LR') Now I retrieve the leaf indices of the first training sample in the XGBoost ensemble model: ensemble.apply(train_x[:1]) # leaf indices in all 200 base learner trees array([[6, 6, 4, 6, 4, 6, 5, 5, 4, 5, 4, …

Topic: ensemble-learning xgboost ensemble-modeling machine-learning

Category: Data Science

Stacking - Appropriate base and meta models

thereandhere1

2020年9月9日 18:01

When implementing stacking for model building and prediction (For example using sklearn's StackingRegressor function) what is the appropriate choice of models for the base models and final meta model? Should weak/linear models be used as the base models and an ensemble model as the final meta model (For example: Lasso, Ridge and ElasticNet as base models, and XGBoost as a meta model). Or should non-linear/ensemble models be used as base models and linear regression as the final meta model (For …

Topic: ensemble-learning meta-learning ensemble-modeling regression scikit-learn

Category: Data Science

What is the difference between ensemble methods and hybrid methods, or is there none?

Tibo Geysen

2020年5月29日 13:33

I have the feeling that these terms often are used as synonyms for one another, however they have the same goal, namely increasing prediction accuracy by combining different algorithms. My question thus is, is there a difference between them? And if so is there some book/paper that explains the difference?

Topic: ensemble-learning ensemble-modeling machine-learning

Category: Data Science

What is the form of data used for prediction with generalized stacking ensemble?

rocksNwaves

2020年5月17日 00:39

I am very confused as to how training data is split and on what data level 0 predictions are made when using generalized stacking. This question is similar to mine, but the answer is not sufficiently clear: How predictions of level 1 models become training set of a new model in stacked generalization. My understanding is that the training set is split, base models trained on one split, and predictions are made on the other. These predictions now become features …

Topic: generalization ensemble-learning ensemble-modeling

Category: Data Science

How can I improve my model on a very very small dataset?

Sadegh

2020年5月13日 22:06

I am starting as a PhD student and we want to find appropriate materials (with certain qualities) from basic chemical properties like charge, etc. There are a lot of models and datasets in similar works, but since our work is pretty novel, we have to make and test each data sample ourselves. This makes the data acquisition very very slow and very expensive. Our estimated samples will be 10-15 samples for some time, until we can expand it. Now I …

Topic: ensemble-learning data-augmentation regression machine-learning

Category: Data Science

ValueError: Graph disconnected: cannot obtain value for tensor Tensor

shiva

2019年12月31日 14:15

I'm trying to perform a stacking ensemble of three VGG-16 models, all custom-trained on my personal dataset and having the same input shape. This is the code: input_shape = (256,256,3) model_input = Input(shape=input_shape) def load_all_models(n_models): all_models = list() model_top1 = load_model('weights/vgg16_1.h5') all_models.append(model_top1) model_top2 = load_model('weights/vgg16_2.h5') all_models.append(model_top2) model_top3 = load_model('weights/vgg16_3.h5') all_models.append(model_top3) return all_models n_members = 3 members = load_all_models(n_members) print('Loaded %d models' % len(members)) #perform stacking def define_stacked_model(members): for i in range(len(members)): model = members[i] for layer in model.layers: # make …

Topic: ensemble-learning keras deep-learning neural-network classification

Category: Data Science

About