Train/Test Split after performing SMOTE
I am dealing with a highly unbalanced dataset so I used SMOTE to resample it.
After SMOTE resampling, I split the resampled dataset into training/test sets using the training set to build a model and the test set to evaluate it.
However, I am worried that some data points in the test set might actually be jittered from data points in the training set (i.e. the information is leaking from the training set into the test set) so the test set is not really a clean set for testing.
Does anyone have any similar experience? Does the information really leak from the training set into the test set? Or does SMOTE actually take care of this and we do not need to worry about it?
Topic smote class-imbalance evaluation machine-learning
Category Data Science