smote

Methods for augmenting binary datasets

S_S

2022年6月1日 21:04

I have a small (~100 samples) dataset with roughly 20 features which are mostly binary, and a few are numeric (~5). I wanted to use methods for augmenting the training set and see if I can get better test accuracy. What methods/code can I use for augmenting binary datasets?

Topic: boosting data-augmentation smote

Category: Data Science

Imbalanced classification

yassine sfayhi

2022年5月21日 07:14

I've tried all kind of oversampling undersampling techniques and I've tried also weighted Xgboost ( the model I'm trying to improve) I couldn't surpass a very Bad F1 score : 0.09 What should I do

Topic: imbalanced-learn smote xgboost random-forest machine-learning

Category: Data Science

What's the order in applying SMOTE transformation in a pipeline?

dummyds

2022年5月18日 01:58

Here's the thing, I have an imbalanced data and I was thinking about using SMOTE transformation. However, when doing that using a sklearn pipeline, I get an error because of missing values. This is my code: from sklearn.pipeline import Pipeline # SELECAO DE VARIAVEIS categorical_features = [ "MARRIED", "RACE" ] continuous_features = [ "AGE", "SALARY" ] features = [ "MARRIED", "RACE", "AGE", "SALARY" ] # PIPELINE continuous_transformer = Pipeline( steps=[ ("imputer", SimpleImputer(strategy="most_frequent")), ("scaler", StandardScaler()), ] ) categorical_transformer = Pipeline( steps=[ …

Topic: smote sampling logistic-regression python predictive-modeling

Category: Data Science

Optimizing decision threshold on model with oversampled/imbalanced data

rayven1lk

2022年5月5日 03:01

I'm working on developing a model with a highly imbalanced dataset (0.7% Minority class). To remedy the imbalance, I was going to oversample using algorithms from imbalanced-learn library. I had a workflow in mind which I wanted to share and get an opinion on if I'm heading in the right direction or maybe I missed something. Split Train/Test/Val Setup pipeline for GridSearch and optimize hyper-parameters (pipeline will only oversample training folds) Scoring metric will be AUC as training set is …

Topic: grid-search smote model-selection cross-validation

Category: Data Science

Unbalanced data set - how to optimize hyperparams via grid search?

Code Now

2022年5月4日 06:07

I would like to optimize the hyperparameters C and Gamma of an SVC by using grid search for an unbalanced data set. So far I have used class_weights='balanced' and selected the best hyperparameters based on the average of the f1-scores. However, the data set is very unbalanced, i.e. if I chose GridSearchCV with cv=10, then some minority classes are not represented in the validation data. I'm thinking of using SMOTE, but I see the problem here that I would have …

Topic: grid-search smote multiclass-classification class-imbalance scikit-learn

Category: Data Science

class imbalance - applied SMOTE - next steps

machlear7

2022年4月18日 18:45

I am new to ML and learnt a lot from your valuable posts. I need your advise with the following situation and guidance on if the steps make sense. I have a binary classification problem, my dataset has a severe imbalance approximately 2% positive cases (4,000 cases) out of a total of 200,000 cases. I separated my dataset into a train and a test (80/20 stratified split). My train now has total of 160,000 cases (3,200 positive cases) and test …

Topic: smote class-imbalance r

Category: Data Science

Noise Elimination with majority vote filtering

Jeuszt

2022年4月7日 05:04

I have a dataset with label noise which I wan't to clean with majority/consensus vote filtering. This will mean I will divide the data in K-Folds and train an ensemble model. Than using the predictions on the data I will remove rows, which are missclassified by most (majority voting) or all (consensus voting). I have a few questions on which I can't find the answers elsewhere: how to decide what models to use in the ensemble the dataset is very …

Topic: noise smote data-cleaning

Category: Data Science

Solving multi-class imbalance classification using smote and OSS

Ayushi Chaplot

2022年4月6日 03:08

I am trying to solve a multi-class imbalance classification problem. For that, I am using SMOTE for oversampling and OSS for under-sampling. But I have a doubt as I am working on multi-class so I have to convert it into binary classification. So we can convert it using OVA/OAA. So how can I use OVA/OAA with both under-sampling and oversampling on the same data-set?

Topic: smote multiclass-classification class-imbalance

Category: Data Science

Train score is very lower than Test score, is that normal?

Mimi

2022年3月20日 11:02

I am working on very imbalanced dataset, I used SMOTEENN (SMOTE+ENN) to rebalance it, the following test is made using Random Forest Classifier : My train and Test score before using SMOTEENN: print('Train Score: ', rf_clf.score(x_train, y_train)) print('Test Score: ', rf_clf.score(x_test, y_test)) Train Score: 0.92 Test Score: 0.91 After using SMOTEEN : print('Train Score: ', rf_clf.score(x_train, y_train)) print('Test Score: ', rf_clf.score(x_test, y_test)) Train Score: 0.49 Test Score: 0.85 Edit x_train,x_test,y_train,y_test=train_test_split(feats,targ,test_size=0.3,random_state=47) scaler = MinMaxScaler() scaler_x_train = scaler.fit_transform(x_train) scaler_x_test = scaler.transform(x_test) X …

Topic: score smote random-forest

Category: Data Science

SMOTE for image dataset

emma Joe

2022年3月17日 04:44

I'm working on Image augmentation with Smote. I'm confused that how can SMOTE be useful for an image dataset with containing 5955 images with four classes(2552,227,621,2555). Could anyone please help me? It would be greatly appreciated! I appreciate your help in advance

Topic: smote

Category: Data Science

Preferred approaches for imbalanced data

thereandhere1

2022年3月4日 19:04

I am building a binary classification model with imbalanced target variable (13% Class 1 vs 87% class 0). I am considering the following three options to handle the data imbalance Option1: Create a balanced training dataset where with 50% / 50% split of the target variable. Option 2: Samples the dataset as-is (i.e., 87% / 13% split) and use upsampling methods (e.g., SMOTE) to balance the target variable to 50% / 50% split. Option 3: Use learning methods with appropriate …

Topic: imbalanced-learn smote class-imbalance classification

Category: Data Science

How does SMOTE work for dataset with only categorical variables?

The Great

2022年2月26日 07:50

I have a small dataset of 977 rows with a class proportion of 77:23. For the sake of metrics improvement, I have kept my minority class ('default') as class 1 (and 'not default' as class 0). My input variables are categorical in nature. So, the below is what I tried. Let's assume we don't have age and salary info a) Apply encoding like rare_encoding and ordinal_encoding to my dataset b) Split into train and test split (with stratify = y) …

Topic: smote deep-learning neural-network classification machine-learning

Category: Data Science

Why SMOTE is not used in prize-winning Kaggle solutions?

Carlos Mougan

2022年2月21日 07:33

Synthetic Minority Over-sampling Technique SMOTE, is a well known method to tackle imbalanced datasets. There are many papers with a lot of citations out-there claiming that it is used to boost accuracy in unbalanced data scenarios. But then, when I see Kaggle competitions, it is rarely used, to the best of my knowledge there are no prize-winning Kaggle/ML competitions where it is used to achieve the best solution. Why SMOTE is not used in Kaggle? I even see applied research …

Topic: smote kaggle class-imbalance machine-learning

Category: Data Science

Can SMOTE be used for non-binary classification?

nithin krishna

2022年2月12日 20:52

There is a class imbalance present in my dataset and I would like to balance the dataset. The dependent variable's features are (0,1,2,3,4). How do I make use of SMOTE, SMOTE-N, SMOTE-NC when if they're only used for binary or categorical data?

Topic: smote dataset python

Category: Data Science

Why does class_weight usually outperform SMOTE?

dsbr_

2022年1月22日 00:48

I'm trying to figure out what exactly class_weight from sklearn does. When working with imbalanced datasets, I'm always using class_weight because the results are usually better than using SMOTE. However, I'm not sure why. I've tried to find an answer, but most of answers regarding the subject are vague. For instance, the first answer here explain class_weight in a way that looks similar to SMOTE. This and this also didn't provide an answer. I read once that SMOTE is used …

Topic: imbalanced-data smote class-imbalance classification

Category: Data Science

Follow up question regarding Upsampling for Imbalanced Data and the use of ADASYN instead of SMOTE

Ammar Kamran

2022年1月7日 18:23

I have a follow-up question regarding this topic. I have been working on a project predicting success(1) or failure(0) for organizations by using the Decision Tree and Random Forest algorithms. My dataset has a minority class of successes which I would like to upsample using SMOTE or ADASYN. I understand that the reasoning mentioned in this post applies to SMOTE and random upsampling by duplicating but does this also apply to upsampling via ADASYN? As I under ADASYN introduces even …

Topic: smote sampling class-imbalance random-forest machine-learning

Category: Data Science

Is it good practice to use SMOTE when you have a data set that has imbalanced classes when using BERT model for text classification?

QMan5

2022年1月4日 22:09

I had a question related to SMOTE. If you have a data set that is imbalanced, is it correct to use SMOTE when you are using BERT? I believe I read somewhere that you do not need to do this since BERT take this into account, but I'm unable to find the article where I read that. Either from your own research or experience, would you say that oversampling using SMOTE (or some other algorithm) is useful when classifying using …

Topic: oversampling bert smote

Category: Data Science

Applying SMOTE on time series data

Mack

2021年12月17日 02:32

I have a dataset that consist of student grades and it's based on a time series. I used LSTM to predict the student future grade. Can I apply SMOTE on this dataset to ensure that the model will not be biased towards certain student grades?

Topic: smote class-imbalance time-series

Category: Data Science

Train/Test Split after performing SMOTE

Edamame

2021年12月16日 11:54

I am dealing with a highly unbalanced dataset so I used SMOTE to resample it. After SMOTE resampling, I split the resampled dataset into training/test sets using the training set to build a model and the test set to evaluate it. However, I am worried that some data points in the test set might actually be jittered from data points in the training set (i.e. the information is leaking from the training set into the test set) so the test …

Topic: smote class-imbalance evaluation machine-learning

Category: Data Science

SMOTE for multi-class balance changes the shape of my dataset

rSar

2021年12月9日 14:43

So I have a dataset of shape (430,17), that consists of 13 classes (imbalanced) and 17 features. The end goal is to create a NN which btw works when I import the imblanced dataset, however when i try to over-sample the minority classes using SMOTE in jupyter notebook, the classes do get balanced but also the shape changes. from imblearn.over_sampling import SMOTE from sklearn.preprocessing import OneHotEncoder from imblearn.pipeline import Pipelineenter steps = [('onehot', OneHotEncoder()), ('smt', SMOTE())] pipeline = Pipeline(steps=steps) X_res, …

Topic: smote jupyter multiclass-classification neural-network python

Category: Data Science

About