Building a custom scoring function to find mean time-dependent AUC
I’m working on a survival analysis to predict 1-year mortality.
I’m trying to build a custom score function that maximizes mean time-dependent AUC. Here is a description of the time-dependent AUC metric from the sckikit-survival package. This custom score function would be used in the GridSearchCV to select hyperparamters.
The challenge is that the time-dependent AUC metric requires calling on survival_train. Is it possible to call survival_train within cross fold validation?
Here is a layout of the code:
# Instantiate pipeline.
cph = Pipeline(steps = [('preprocessor', preprocessor),
('cox_ridge', CoxPHSurvivalAnalysis())])
# Alpha parameters for grid search.
param_grid = {
'cox_ridge__alpha': [0.1, 1, 10]
}
# The xxx is where survival_train would be called, but not sure how to do this.
def mean_auc_score (estimator, X, y):
times = np.arange(28, 365, 14)
y_pred = estimator.predict(X)
mean_auc = cumulative_dynamic_auc(xxx, y, y_pred, times)[1]
return mean_auc
mean_auc_scorer = make_scorer(mean_auc_score, greater_is_better = True)
# 5 fold cross-validation
cv = KFold(n_splits = 5, shuffle = True, random_state = 42)
# Grid search
gcv_cox_r = GridSearchCV(estimator = cph,
param_grid = param_grid,
scoring = mean_auc_scorer,
cv = cv)
# Fit model
gcv_cox_r.fit(training_x, training_act_y)
Topic survival-analysis cross-validation python machine-learning
Category Data Science