Which metric should I use for classifying an imbalace data with fewer labels for the negative class?

From reading, I understand that when we have fewer positive class labels, it is better to use precision or recall as the evaluation metric. Which metric should I use when we have fewer negative samples?

I'm looking for an approach other than switching the labels.

Problem setting: I'm developing parametrized fragility functions for predicting damage to a structure (for example trees). An example of fragiltiy function is here The fragility function will estimate the probability of exceeding a damage state given some parameters (say wind load). The damage state can be expressed in terms of damage ratio (0-1 with 1 being fully damaged). Now, we are interested in estimating the probability of exceeding a damage ratio given features. To elaborate, the probability of any damage would be P(Damage_ratio0.0|features). Logistic regression can be used to learn this curve from data after categorizing 0-1 continuous damage ratio to damaged (- class)/no damage (+ class) for a particular threshold. Now, as we move from the threshold from 0 to 1, the dataset transforms from imbalanced data dominated by damaged cases to a balanced state and then finally to another imbalanced data dominated by non-damage case.

Now, when learning the model, 'AUR-ROC' performs really well when the data is balanced. Precision performs well when the data is imbalanced with more no-damage case (P(Damage_ratio0.1|features)). These metrics don't do well for the case with few negative case (P(Damage_ratio0.9|features)). I tried switching the label with very limited success. Are there any other 'metrics' that perform well in an imbalanced data setting?

Topic class-imbalance classification

Category Data Science


The names of the classes don't matter, you might as well call them class A and class B. In binary classification the typical choice is to evaluate using precision, recall and F1-score. There are other options, but that depends on the task.

Assuming you choose F1-score, the choice of which class you select as the "positive" class for evaluation also depends on the task. Usually it's recommended to use the minority class because it's the most challenging one for the classifier.

The only problem here is the possible confusion of calling a class "negative" and calculating F1-score using it as the "positive" class, but that's just a naming issue. You could easily add this point to the explanations, or avoid any confusion by calling your classes A and B for instance.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.