How do I predict survival curves using xgboost?

The xgboost package enables survival modeling using parameter arguments: objective = "survival:cox" and eval_metric = "cox-nloglik".

The predict method for the resulting model only outputs risk scores (same as type = "risk" in the survival::coxph function in r).

How do I use xgboost to predict entire survival curves?

Topic xgboost survival-analysis

Category Data Science


The proportional hazard model assumes hazard rates of the form: $h(t|X) = h_0(t) \cdot risk(X)$ where usually $risk(X) = exp(X\beta)$. The xgboost predict method returns $risk(X)$ only. What we can do is use the survival::basehaz function to find $h_0(t)$.

Problem is it's not "calibrated" to the actual baseline hazard rate computed in xgboost. What we can do is find some constant $C$ that minimizes the ibrier score between the sample observed death/censorship times and $h_0(t) \cdot risk(X) \cdot C$.

I've implemented this approach in a tiny R package I've written.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.