using average precision as metric for imbalanced problem (learning curve example)

I have an imbalanced problem (2% target class) and therefore need an appropriate metric - so I chose average_precision.

My code:

cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)

train_sizes, train_scores, test_scores = learning_curve(
estimator, X, y, cv=cv, n_jobs=2, train_sizes=train_sizes, scoring= 'average_precision')
train_scores_mean = np.mean(train_scores, axis=1)
test_scores_mean = np.mean(test_scores, axis=1)
plt.grid()

However, when I do this, I get a pretty poor result. What can I do differently? Should I undersample? I care about the probabilities so am wondering what best approach is to this. (not using xgboost)

Topic metric learning training class-imbalance

Category Data Science


[heavily edited: in the first version of this answer I mistakenly interpreted "average precision" as "precision"]

Based on the information shown on the graph the model is clearly overfit even with the maximum number of instances (visible from the large difference between training and testing performance). This means that the model is too complex given the number of instances. Based on the evolution of the curves, you'd need a lot more instances for the two curves to come close to each other.

You could indeed try resampling, but it's not sure that it would solve the problem: it would likely improve recall but also decrease precision (more false positive errors). .

As often I suspect that the problem is deeper: the features are just not good indicators for the label. I'd suggest working on this issue to see if it can be improved.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.