Oversampling on Sequence(Text) data

Question

Oversampling on Sequence(Text) data

AnonymousMe

2021年8月3日 17:14

Has anyone been able to perform synthetic oversampling on Sequential data? From what I've read and understand, the oversampling/undersampling techniques that are currently used are only applicable on structured, tabular data.

But, if I've got a sequential data like this:

     Sequence              Label

 [1,2,3,5,0,0,0,0]           3
 [4,5,2,3,5,0,0,0]           5
 [3,4,0,0,0,0,0,0]           7

where each sequence consists of integer tokens and padding, how do I perform SMOTE/ any other synthetic oversampling techniques? I don't want to do random replication of examples, since that's not very meaningful and prone to overfitting.

Could someone give me suggestions as to how I can go about implementing this in Python?

Topic imbalanced-learn class-imbalance scikit-learn python machine-learning

Category Data Science

Oversampling on Sequence(Text) data

About