How to increase the Accuracy after Oversampling?

Question

How to increase the Accuracy after Oversampling?

Mimi

2022年4月1日 21:03

The Accuracy before ovesampling :

On Training : 98,54% On Testing : 98,21%

The Accuracy after ovesampling :

On Training : 77,92% On Testing : 90,44%

What does mean this and how to increase the accuracy ?

Edit:

Classes before SMOTE:

dataset['Label'].value_counts()

BENIGN           168051
Brute Force        1507
XSS                 652
Sql Injection        21

Classes after SMOTE:

BENIGN           117679 
Brute Force      117679 
XSS              117679 
Sql Injection    117679

I used the following model:

-Random Forest : 
Train score : 0.49   Test score: 0.85
-Logistic Regression : 
Train score: 0.72    Test score: 0.93

-LSTM:
Train score: 0.79    Test score: 0.98

Topic oversampling accuracy

Category Data Science

Akash Dubey · Accepted Answer · 2021年6月21日 13:14

This is weird that test accuracy is greater than the training accuracy. However, the plausible observation/explanation after looking at the distribution of classes are :

The classes are highly imbalance for a multi-class classification setting.
You are using SMOTE for oversampling. You can also try Adaptive Synthetic Sampling Approach for Imbalanced Learning (ADASYN) and check if the result improves.
However, the most important thing is : You should try to optimise for Recall or F1 score given your classes are highly imbalanced. Since, Accuracy is not a preferred metric in a highly imbalanced classification problem. I would recommend optimising for Recall.

Possible recommendations :

Hyperparameter tuning
Better regularisation
K-Fold cross validation.
Make sure that train, validation and test sets are different.

How to increase the Accuracy after Oversampling?

About