Train score is very lower than Test score, is that normal?

I am working on very imbalanced dataset, I used SMOTEENN (SMOTE+ENN) to rebalance it, the following test is made using Random Forest Classifier :

My train and Test score before using SMOTEENN:

print('Train Score: ', rf_clf.score(x_train, y_train))
print('Test Score: ', rf_clf.score(x_test, y_test))
Train Score: 0.92
Test Score: 0.91

After using SMOTEEN :

print('Train Score: ', rf_clf.score(x_train, y_train))
print('Test Score: ', rf_clf.score(x_test, y_test))
Train Score: 0.49
Test Score: 0.85

Edit

x_train,x_test,y_train,y_test=train_test_split(feats,targ,test_size=0.3,random_state=47)

scaler = MinMaxScaler()
scaler_x_train = scaler.fit_transform(x_train)
scaler_x_test = scaler.transform(x_test)
X = scaler_x_train
y = y_train.values

from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import EditedNearestNeighbours
from imblearn.combine import SMOTEENN
   
oversample = SMOTEENN(random_state=101,smote=SMOTE(),enn=EditedNearestNeighbours(sampling_strategy='majority'))
start = time.time()
X, y = oversample.fit_resample(X, y)
stop = time.time()
print(fTraining time: {stop - start}s)

rf_model = RandomForestClassifier(n_estimators=200, class_weight='balanced', criterion='entropy', random_state= 0, verbose= 1, max_depth=2)
rf_mod = OneVsRestClassifier(rf_model)
rf_mod.fit(X, y)

Topic score smote random-forest

Category Data Science


You are probably not applying the same resampling technique to test dataset. If you put the logic into a imbalanced-learn's Pipeline, the appropriate resampling will be automatically handled for you.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.