oversampling

Is my model classification overfitting?

2022年5月16日 20:38

Is this possible to be just bad draw on the 20% or is it overfitting. I'd appreciate some tips on what's going on, thanks

Topic: oversampling

Category: Data Science

How to increase the Accuracy after Oversampling?

Mimi

2022年4月1日 21:03

The Accuracy before ovesampling : On Training : 98,54% On Testing : 98,21% The Accuracy after ovesampling : On Training : 77,92% On Testing : 90,44% What does mean this and how to increase the accuracy ? Edit: Classes before SMOTE: dataset['Label'].value_counts() BENIGN 168051 Brute Force 1507 XSS 652 Sql Injection 21 Classes after SMOTE: BENIGN 117679 Brute Force 117679 XSS 117679 Sql Injection 117679 I used the following model: -Random Forest : Train score : 0.49 Test score: 0.85 …

Topic: oversampling accuracy

Category: Data Science

Does synthetic data be over sampled as well?

guestmember123456790

2022年3月31日 06:07

I'm building a binary text classifier, the ratio between the positives and negatives is 1:100 (100 / 10000). By using back translation as an augmentation, I was able to get 400 more positives. Then I decided to do up sampling to balance the data. Do I include only the positive data points (100) or should I also include the 400 that I have generated? I will definitely try both, but I wanted to know if there is any rule of …

Topic: oversampling data-augmentation class-imbalance classification

Category: Data Science

Is it good practice to use SMOTE when you have a data set that has imbalanced classes when using BERT model for text classification?

QMan5

2022年1月4日 22:09

I had a question related to SMOTE. If you have a data set that is imbalanced, is it correct to use SMOTE when you are using BERT? I believe I read somewhere that you do not need to do this since BERT take this into account, but I'm unable to find the article where I read that. Either from your own research or experience, would you say that oversampling using SMOTE (or some other algorithm) is useful when classifying using …

Topic: oversampling bert smote

Category: Data Science

Explaining the logic behind the pipe_line method for cross-validation of imbalance datasets

PwNzDust

2022年1月1日 21:20

Reading the following article: https://kiwidamien.github.io/how-to-do-cross-validation-when-upsampling-data.html There is an explanation of how to use from imblearn.pipeline import make_pipeline in order to perform a cross-validation on an imbalanced dataset while avoiding memory leakage. Here I copy the code used in the notebook linked by the article: X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=45) rf = RandomForestClassifier(n_estimators=100, random_state=13) imba_pipeline = make_pipeline(SMOTE(random_state=42), RandomForestClassifier(n_estimators=100, random_state=13)) cross_val_score(imba_pipeline, X_train, y_train, scoring='recall', cv=kf) new_params = {'randomforestclassifier__' + key: params[key] for key in params} grid_imba = GridSearchCV(imba_pipeline, param_grid=new_params, …

Topic: oversampling pipelines imbalanced-learn methodology class-imbalance

Category: Data Science

oversampling multivariate time series data

ala

2021年10月8日 22:02

For some classification needs. I have multivariate time series data composed from 4 stelite images in form of (145521 pixels, 4 dates, 2 bands) I made a classification with tempCNN to classify the data into 5 classes. However there is a big gap between the class 1,2 with 500 samples and 4,5 with 1452485 samples. I' am wondering if there is a method that help me oversamling the two first classes to make my dataset more adequate for classification.

Topic: oversampling multiclass-classification classification time-series python

Category: Data Science

Oversampling techniques for a class with 1 sample

Ossz

2021年7月14日 22:04

I have 5 classes, one of them having only one sample. I've been researching techniques to oversample such as SMOTE and Bootstrapping but they do not work for the class with only one sample. I am considering repetition of this class. Are there any other strategies you would recommend? Would repetition followed by SMOTE make sense or not really? Due to the nature of SMOTE using k-nearest neighbors?

Topic: oversampling bootstraping smote

Category: Data Science

How to use SMOTE to rebalance multiclass dataset when the target is one hot encoded with pd.get_dummies?

Mimi

2021年6月2日 17:17

I'm using a multiclass dataset (cic-ids-2017), which is very imbalanced. I have already encoded the categorical feature (which is the target) using OneHotEncoder. I tried to use SMOTE oversampling method to balance the data with pipeline: X = df.drop(['Label'],1) y = df.Label steps = [('onehot', OneHotEncoder()), ('smt', SMOTE())] pipeline = Pipeline(steps=steps) X, y = pipeline.fit_resample(X, y) When I used pd.get_dummies instead of OneHotEncoder, in this case I could not use the pipeline (because of get_dummies). How can I balance the …

Topic: oversampling one-hot-encoding smote class-imbalance

Category: Data Science

How to properly use oversampling without inflating results?

heresthebuzz

2021年4月8日 15:52

I am using with a tiny private dataset (over 192 samples) with 4 classes. A preprocessing step is trivial in order to do any classification. Among feature selection and extraction techniques, i decided to apply oversampling (SMOTE). Here is what i did: Using the entire dataset (original 192 samples): Create synthetic samples for each class using SMOTE, so i get a total of 500 samples per class (2000 in total) I have a big suspicion about this procedure because when …

Topic: oversampling smote preprocessing classification python

Category: Data Science

About