Can an Imbalanced Datset be an oportunity for Transfer Learning with Neural Networks?
While solving classification tasks on imbalanced datasets with Neural Networks(NN) there are two general ways of handling imbalanced data:
A. Resample the data, either with over or undersampling until it's balanced.
B. Compute a weight per sample, to weigh the losses according to the class occurrence.
I thought about a third way, that might be possible:
C. Train and Autoencoder on all the input data, and use its encoding part, first layers for the actual Classifier. This way, the information in the majority classes would not be thrown away (as in A). Nor would every sample need to put through the actual Classifier while training (as in B), potentially saving computational costs.
Has anyone had any experience with this kind of approach or knows a publication regarding this?
To follow up on Dave's comment: I have an imbalanced dataset that had too high Accuracy (0.90) compared to Precision (never over 0.2) and Recall (never over 0.2). Balancing the Dataset with undersampling did the trick and got the Precision and Recall up, at the loss of accuracy. This model should predict several stages of illness (non-binary classification?) as an aid for practitioners, where a False Positive is considered less harmful than a False Negative.
Topic imbalanced-data loss-function neural-network
Category Data Science