Building a custom scoring function to find mean time-dependent AUC

I’m working on a survival analysis to predict 1-year mortality.

I’m trying to build a custom score function that maximizes mean time-dependent AUC. Here is a description of the time-dependent AUC metric from the sckikit-survival package. This custom score function would be used in the GridSearchCV to select hyperparamters.

The challenge is that the time-dependent AUC metric requires calling on survival_train. Is it possible to call survival_train within cross fold validation?

Here is a layout of the code:

# Instantiate pipeline.
cph = Pipeline(steps = [('preprocessor', preprocessor),
                        ('cox_ridge', CoxPHSurvivalAnalysis())])

# Alpha parameters for grid search. 
param_grid = {
    'cox_ridge__alpha': [0.1, 1, 10]
}

# The xxx is where survival_train would be called, but not sure how to do this. 
def mean_auc_score (estimator, X, y): 
    times = np.arange(28, 365, 14)
    y_pred = estimator.predict(X)
    mean_auc = cumulative_dynamic_auc(xxx, y, y_pred, times)[1]
    return mean_auc

mean_auc_scorer = make_scorer(mean_auc_score, greater_is_better = True)

# 5 fold cross-validation
cv = KFold(n_splits = 5, shuffle = True, random_state = 42)

# Grid search 
gcv_cox_r = GridSearchCV(estimator = cph,
                          param_grid = param_grid,
                          scoring = mean_auc_scorer,
                          cv = cv)

# Fit model
gcv_cox_r.fit(training_x, training_act_y)

Topic survival-analysis cross-validation python machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.