What ML techniques work on imbalanced datasets
I have some specific questions for which I could not find answers in textbooks/research articles. Shall be grateful for an answer. These are:
Are there ML techniques that can be directly applied on class imbalanced datasets? OR is it a practice to balance the dataset either by using some weighted approach or SMOTE methods? What is the standard way for real datasets/industries? I am referring to fraud detection, anomaly and water leak detection where inherently the dataset would always be imbalanced.
Let's say I do class balancing by some weighted loss function. This loss function would be calculated on some amount of data say 100 examples of streaming data. Then during deployment phase, I may not have 100 examples of data coming; it could be more or less than what was used in training. The weighting approach is say the inverse class frequency approach which depends on the number of examples. So if the weighting approach was used on 100 examples during training, then during deployment phase I should always have 100 examples to work on to make some prediction on all those examples? Or there is no dependency between number of training examples and number of examples during deployment when doing class balancing?
Topic self-study class-imbalance classification predictive-modeling
Category Data Science