Class imbalance: Will transforming multi-label (aka multi-task) to multi-class problem help?
I noticed this and this questions, but my problem is more about class imbalance. So now I have, say, 1000 targets and some input samples (with some feature vectors). Each input sample can have label '1' for many targets (currently tasks), meaning they interact. Label '0' means they don't interact (for each task, it is a binary classification problem).
Unbalanced data
My current issue is: For most targets there are 1% samples (perhaps 1 or 2) that are labelled 1. Since I have to split train-val-test and calculate AUROC, there are in fact only 3 targets left that can support the classification under some threshold (say, have 5% positive labels across all samples).
Transform or not?
Someone has suggested modeling this as a multi-class problem instead of a multi-task problem, meaning I would transform the label vector of each sample into a set of label-1 targets. For example, if sample A originally has label 1 for targets 12, 232, 988 (and 0 for all others), the new label for sample A would simply be {12, 232, 988}=label_id.
But this might make the situation worse, because now a target (task) does not share labels across samples, e.g., if sample B interacts with target 12 and 232 only, originally targets (tasks) 12 and 232 would have two positively labeled data points, but now those two samples become totally different.
Would appreciate any suggestions! Side note: I'm using simple classifiers such as MLP or SVM. If there are any specific methods designed for imbalanced data (which I've never heard of), that would also be wonderful.