ensemble-modeling

What is meant by this notation for ensemble classifier error rate

Arun Jose

2022年5月22日 11:12

The below is a picture which denotes the error of an ensemble classifier. Can someone help me understand the notation What does it mean to have (25 and i) in brackets and what is ε^1 is it error of first classifier or the error rate raised to power i. Can someone explain this formulae.

Topic: notation ensemble-modeling classification

Category: Data Science

How to avoid memory error with Pandas pd.read_csv method call with GridSearchCV usage for DecisionTreeRegressor model?

AndrewBharadwajKalahasti

2022年5月17日 12:04

I have been implementing a DecisionTreeRegressor model in Anaconda environment with a data set sourced from a 20 million row, 12-dimensional CSV file. I could get the chunks off of the data set with chunksize set to 500,000 rows and process the computation of the R-Squared score on the training/test split data sets in each iteration of 500,000 rows till iteration #20. sklearn.__version__: 0.19.0 pandas.__version__: 0.20.3 numpy.__version__: 1.13.1 The GridSearchCV() instance uses parameter grid with parameter max_depth set to values …

Topic: ensemble-modeling decision-trees scikit-learn pandas python

Category: Data Science

Retrieve user features in real time from UserId for prediction

rohan23

2022年4月30日 04:07

Let's say I'm building an app like Uber and I want to predict the user's most likely destination based on the user's past history, current latitude/longitude, and time/date. Here is the proposed architecture - Let's say I have a pre-trained model hosted as a service. The part I'm struggling with is, how do I get the user features from the database in realtime from the RiderID to be used by the prediction service (XGBoost Model)? I'm guessing a lookup in …

Topic: model-selection ensemble-modeling apache-spark predictive-modeling machine-learning

Category: Data Science

How to make an ensemble model for classification with pytorch using trained models?

dfrankow

2022年4月26日 23:49

I am trying to make an ensemble model composed of two pre-trained models, using torch, in order to classify an image. Below is some code, based on this post. import timm import torch from torch.nn import functional as F num_classes = 100 model1 = timm.create_model("efficientnet_b0", num_classes=num_classes) checkpoint1 = torch.load(checkpoint_path1) model1.load_state_dict(checkpoint1["model"]) model2 = timm.create_model("efficientnet_b2", num_classes=num_classes) checkpoint2 = torch.load(checkpoint_path2) model2.load_state_dict(checkpoint2["model"]) class EnsembleModel(nn.Module): def __init__(self, modelA, modelB, num_features): super().__init__() self.modelA = modelA self.modelB = modelB self.classifier = nn.Linear(2*num_features, num_features) def forward(self, x): x1 …

Topic: pytorch image-classification ensemble-modeling

Category: Data Science

Drawing validation set from test set

U. User

2022年4月25日 07:03

I am building a 3 neural network models on dataset that is already separated to train and test sets. From my analysis, I found that this dataset has values on test set which don't exist in the train set. And this gives a certain limitation or maximum capacity to my neural network model(s). By this I mean, I can not seem to improve the accuracy even if I change the hyper parameters or the parameters of my models. I have …

Topic: ensemble-modeling correlation cross-validation neural-network python

Category: Data Science

XGBoost Log Loss different from GridSearchCV Log Loss

Sean O'Connor

2022年4月24日 08:03

I have a classification problem where I am trying to predict if the data returns a 1 or 0. So your classic binary classification. I have my set of data that I have split into the dependent variables (ones I am training on) and the independent variable (my target that I am predicting, either a 0 or 1). I am using log loss as the scoring metric for my model. Firstly, I am using the cv function in xgboost to …

Topic: grid-search xgboost ensemble-modeling classification machine-learning

Category: Data Science

Output of a model as additional input of another model to solve the same task

DanielRoncel

2022年4月16日 09:33

I was wondering about whether it is possible to train a ML model for a classification task with dataset D, and then train another model to solve the same classification task, which takes as input dataset D and the output of the first model. I am worried about the data partition in this case (i.e. whether the models need to be trained and tested with different data samples). If this kind of ensemble method exists, how is it called? As …

Topic: ensemble-modeling machine-learning

Category: Data Science

General practices for building an incremental learning model which never forgets?

MetaStack

2022年4月13日 14:15

I'm new to datascience and appreciate your sage advice! I need to build an incremental learning model, and I know there's a lot that goes into something like that, but I'd like to highlight the most fundamental, abstract requirement I have in my particular case and ask you to focus your attention on that. However, I build the incremental learning model it must never forget what it has learned. That is when it learns something new it can't forget something …

Topic: ensemble-modeling deep-learning scikit-learn

Category: Data Science

Random LightGBM Forest

CutePoison

2022年4月13日 14:06

I'm not completly sure about the bias/variance of boosted decision trees (LightGBM especially), thus I wonder if we generally would expect a performance boost by creating an ensemble of multiple LightGBM models, just like with Random Forest?

Topic: bagging gradient-boosting-decision-trees lightgbm ensemble-modeling

Category: Data Science

Making an ensemble model for high F1 score

Kanishk Mair

2022年4月6日 09:06

I presently have 2 algorithms that have a numerical output. Using a threshold of 0.9, I get the classification output. Let's say they are: P (high precision, low recall) R (high recall, low precision) Individually, they have poor F-1 scores. Is the naive way of creating a classifier C as: C(*) = x.P(*) + (1-x).R(*) And optimizing for x and threshold a good approach to improve the F-1 score? Or is there some alternate approach I must try. Note: I …

Topic: f1score ensemble-modeling binary classification

Category: Data Science

Combining features from multiple models and optimising features

Next Door Engineer

2022年3月31日 17:18

I have multiple models predicting an outcome (continuous) and I want to take action to optimize the best values of these features to make a decision. Consider a regression model, y1 = m1x1 + m2x2 + m3x3 + k and another mode preponsity model y2 = P[1 | x5, x6, x3] = 1 / (1 + exp(-m5x5 - m6x2 - m7x3)) I would want to maximize y3 say y3 = f(y1, y2) by finding the optimal values of the features …

Topic: ensemble-modeling optimization machine-learning

Category: Data Science

Combining image and scalar inputs into a neural network

Vladislav Isaev

2022年3月30日 18:07

I'm looking at the best way of combining CNN with image input and a scalar value. I know that one of the ways is to concatenate flatten layer with this scalar value. But flatten layer consist for example 2048 such scalar values with different magnitude than a single input value. And what if in a real task this scalar value has more influence than image. Also one of the examples is a combination of a text and image and then …

Topic: cnn ensemble-modeling deep-learning neural-network machine-learning

Category: Data Science

Is ensemble learning a subset of meta learning?

Inuraghe

2022年3月29日 07:32

I'm studying ensemble learning methods, focus on random forest and gradient boost. I read this article about this topic and this about meta learning. It is possible to say that ensemble learning is a subset of meta learning?

Topic: meta-learning ensemble-modeling

Category: Data Science

Voting classifier using grid search for Time Series

Shan Khan

2022年3月28日 02:02

I have three models: Arima Auto ARIMA Double Exponential Smoothing I would like to apply an ensemble method - a voting method and allow the classifier to learn weights for these three models. I have checked the votingclassifier present in scikit learn. It requires: fit(x,y) to run. Time series object that is present in series object don't have y. How do you apply a voting classifier and learn weights through grid search?

Topic: grid-search ensemble-modeling time-series python

Category: Data Science

Is there a way to combine multiple ML models where each use datasets with different features?

Bruce

2022年3月26日 17:55

I have a dataset where some features (c,d) apply to only when a feature (a) is a specific value. For example a, b, c, d T, 60, 0x018, 3252002711 U, 167, , U, 67, , T, 66, 0x018, 15556 So I'm planning to splitting the dataset so that there are no missing values. a, b, c, d T, 60, 0x018, 3252002711 T, 66, 0x018, 15556 a, b U, 167 U, 67 and then put these into individual models which combine …

Topic: features ensemble-modeling dataset machine-learning

Category: Data Science

How to improve model performace when model shows a systemic pattern in residues

PPR

2022年3月23日 18:29

I'm working on a regression model using Boosting algorithms (CatBoost, XGBoost, and LightGBM). All models give similar accuracy of 0.2 RMSE (Target varies from 0 to 1). I obtained the following plots when I plotted residues. My model is overpredicting for small target value (near zero) and underpredicting for large target value (near 1). How can I improve my model performance? The model is not overfitting and I'm doing an exhaustive hyperparameter search and basic feature engineering. I'm trying to …

Topic: data-science-model boosting data ensemble-modeling

Category: Data Science

How should I handle time-duration-based columns in classification?

Nicholas Feruch

2022年3月22日 09:42

For example, say I am trying to predict whether I will win my next pickleball game. Some features I have are the number of hits, how much water I’ve drinken, etc, and the duration of the match. I’m asking specifically for ensemble models but will extend this question to other scenarios, what format would the duration column best be in? (e.g. milliseconds, seconds, minutes (integer), minutes (float), one column for minutes and one column for seconds, etc)

Topic: feature-engineering ensemble-modeling classification feature-extraction

Category: Data Science

checking model stability - Performance for different class

Madhi

2022年2月11日 01:04

I tried to do multi-class classification problem. The goal is to predict whether the match will be won by HomeTeam, AwayTeam or Draw. I did feature engineering from the attributes and finally came up with final data to train a classifier. I make sure that the data is balanced for all the 3 class. To train a classifier I did XGB Classifier, Logistic Regression, SGD Classifier and Normal DNN(Tensorflow Estimator). I checked the metrics for all the classifiers and I …

Topic: data-science-model machine-learning-model model-selection ensemble-modeling

Category: Data Science

Stacking neural nets with cross validation

LoneWolf

2022年2月10日 22:52

I am trying to implement stacking model for a ML problem and having hard time figuring out the cross validation strategy. So far I have used 10-fold cross validation for all my models and would like continue using that stacking as well. Here's what I came up with but not sure if it makes sense, At each iteration of 10-fold CV, you will have 9 folds for training (training dataset) and 1 fold for testing (testing dataset). Divide the training …

Topic: ensemble-modeling cross-validation machine-learning

Category: Data Science

Training an ensemble of small neural networks efficiently in TensorFlow 2

MightyCurious

2022年2月9日 15:04

I have a bunch of small neural networks (say, 5 to 50 feed-forward neural networks with only two hidden layers with 10-100 neurons each), which differ only in the weight initialization. I want to train them all on the same, smallish dataset (say, 10K rows), with a batch size of 1. The aim of this is to combine them into an ensemble by averaging the results. Now, of course I can build the whole ensemble as one neural network in …

Topic: training keras tensorflow ensemble-modeling neural-network

Category: Data Science

About