How to handle imbalanced NLP text data set e.g. some classes only have 2 records
I am working on a dataset with around 2000 records.
Around 80% records have their the categorical labels.
There are around 200 categories, some categories got more than 20 records; whereas others only have TWO....
Considering this is a text dataset, so I cannot do the oversampling for minority categories with techniques like what I could do for images.
I am using Fast AI which is based on PyTorch.
So what can I do for it?
Topic fastai pytorch class-imbalance nlp
Category Data Science