Optimising for Brier objective function directly gives worse Brier score than optimising with custom objective - what does it tell me?
I am training an XGBoost model and as I care the most about resulting probabilities, not classification itself I have chosen Brier score as a metric for my model, so that probabilities would be well calibrated. I tuned my hyperparameters using GridSearchCV
and brier_score_loss
as a metric. Here's an example of a tuning step:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=0)
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=123)
model = XGBClassifier(learning_rate=0.1, n_estimators=200, gamma=0, subsample=0.8, colsample_bytree=0.8, scale_pos_weight=1, verbosity=1, seed=0)
parameters = {'max_depth': [3, 5, 7],
'min_child_weight': [1, 3, 5]}
gs = GridSearchCV(model, parameters, scoring='brier_score_loss', n_jobs=1, cv=cv)
gs_results = gs.fit(X_train, y_train)
Finally, I train my main model with chosen hyperparameters on two ways:
optimising for custom objective - brier
, using custom brier_error
function as a metric
model = XGBClassifier(obj=brier, learning_rate=0.02, n_estimators=2000, max_depth=5,
min_child_weight=1, gamma=0.3, reg_lambda=20, subsample=1, colsample_bytree=0.6,
scale_pos_weight=1, seed=0, disable_default_eval_metric=1)
model1.fit(X_train, y_train, eval_metric=brier_error, eval_set=[(X_train, y_train), (X_test, y_train)],
early_stopping_rounds=100)
y_proba1 = model1.predict_proba(X_test)[:, 1]
brier_score_loss(y_test, y_proba1) # 0.005439
roc_auc_score(y_test, y_proba1) # 0.8567
optimising for default binary:logistic
and auc
as an evaluation metric
model2 = XGBClassifier(learning_rate=0.02, n_estimators=2000, max_depth=5,
min_child_weight=1, gamma=0.3, reg_lambda=20, subsample=1, colsample_bytree=0.6,
scale_pos_weight=1, seed=0, disable_default_eval_metric=1)
model2.fit(X_train, y_train, eval_metric='auc', eval_set=[(X_train, y_train), (X_test, y_train)],
early_stopping_rounds=100)
y_proba2 = model2.predict_proba(X_test)[:, 1]
brier_score_loss(y_test, y_proba2) # 0.004914
roc_auc_score(y_test, y_proba2) # 0.8721
I would expect Brier score to be lower in the model1
since we optimise directly for it, but apparently it is not the case (see results above). What does it tell me? Does optimising brier is somehow harder? Should I use more boosting rounds? (Although this was found using grid search with brier_score_loss
...) Is it explainable somehow but data distribution? (e.g. such an issue can occur in the event of unbalanced classes or something like that?) I have no idea where does that situation come from, but probably there is a reason behind that.
Topic objective-function machine-learning-model xgboost optimization
Category Data Science