Categorization of approaches to deal with imbalanced classes

What is the best way to categorize the approaches which have been developed to deal with imbalance class problem?

This article categorizes them into:

  1. Preprocessing: includes oversampling, undersampling and hybrid methods,
  2. Cost-sensitive learning: includes direct methods and meta-learning which the latter further divides into thresholding and sampling,
  3. Ensemble techniques: includes cost-sensitive ensembles and data preprocessing in conjunction with ensemble learning.

The second classification:

  1. Data Pre-processing: includes distribution change and weighting the data space. One-class learning is considered as distribution change.
  2. Special-purpose Learning Methods
  3. Prediction Post-processing: includes threshold method and cost-sensitive post-processing
  4. Hybrid Methods:

The third article:

  1. Data-level methods
  2. Algorithm-level methods
  3. Hybrid methods

The last classification also considers output adjustment as an independent approach.

Thanks in advance.

Topic imbalanced-data imbalance class-imbalance classification machine-learning

Category Data Science


The way I see it all three categorizations agree in many things. For example, all three have a category for pre-processing steps.

I would tend to mostly agree on the third categorization as its more generic and encompasses more things.

  • The data-level category includes any pre-processing steps dealing with class imbalance (e.g. over/under sampling).
  • The algorithm-level could be considered to include the second categories of the first two articles. Any change to the algorithm that deals with class imbalance would go here (e.g. class weighting).
  • Finally, a hybrid category for combining the two.

The only thing missing from the first two articles are the post-processing steps, which to be honest, aren't used in practice as often as the other.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.