How to increase the Accuracy after Oversampling?

The Accuracy before ovesampling :

On Training : 98,54% On Testing : 98,21%

The Accuracy after ovesampling :

On Training : 77,92% On Testing : 90,44%

What does mean this and how to increase the accuracy ?

Edit:

Classes before SMOTE:

dataset['Label'].value_counts()

BENIGN           168051
Brute Force        1507
XSS                 652
Sql Injection        21

Classes after SMOTE:

BENIGN           117679 
Brute Force      117679 
XSS              117679 
Sql Injection    117679 

I used the following model:

-Random Forest : 
Train score : 0.49   Test score: 0.85
-Logistic Regression : 
Train score: 0.72    Test score: 0.93

-LSTM:
Train score: 0.79    Test score: 0.98

Topic oversampling accuracy

Category Data Science


This is weird that test accuracy is greater than the training accuracy. However, the plausible observation/explanation after looking at the distribution of classes are :

  1. The classes are highly imbalance for a multi-class classification setting.
  2. You are using SMOTE for oversampling. You can also try Adaptive Synthetic Sampling Approach for Imbalanced Learning (ADASYN) and check if the result improves.
  3. However, the most important thing is : You should try to optimise for Recall or F1 score given your classes are highly imbalanced. Since, Accuracy is not a preferred metric in a highly imbalanced classification problem. I would recommend optimising for Recall.

Possible recommendations :

  1. Hyperparameter tuning
  2. Better regularisation
  3. K-Fold cross validation.
  4. Make sure that train, validation and test sets are different.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.