using average precision as metric for imbalanced problem (learning curve example)
I have an imbalanced problem (2% target class) and therefore need an appropriate metric - so I chose average_precision
.
My code:
cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)
train_sizes, train_scores, test_scores = learning_curve(
estimator, X, y, cv=cv, n_jobs=2, train_sizes=train_sizes, scoring= 'average_precision')
train_scores_mean = np.mean(train_scores, axis=1)
test_scores_mean = np.mean(test_scores, axis=1)
plt.grid()
However, when I do this, I get a pretty poor result. What can I do differently? Should I undersample? I care about the probabilities so am wondering what best approach is to this. (not using xgboost)
Topic metric learning training class-imbalance
Category Data Science