Survival Analysis: Pseudo Observation Vs Stratified Cox Regression. Which one is better?

Question

Survival Analysis: Pseudo Observation Vs Stratified Cox Regression. Which one is better?

Ajay H

2022年4月23日 02:00

I've been looking into the Cox Regression method for Survival Analysis in Churn Prediction. Cox regression will allow us to determine the probability that a subscriber will unsubscribe after a time $t$, defined by the hazard rate:

$$ h(t \lvert X_i ) = h_0(t)exp\big( \boldsymbol{\beta} ^T\boldsymbol{X}_{i} \big) $$

Where

$h_0(t)$: Baseline Hazard is a prior Probability that any customer churns at time t when all influencing factors are 0.
$\boldsymbol{\beta} \in \mathbb{R}^D$: Exponent of each Coefficient gives us a Hazard ratio. These should be constant w.r.t time (proportionality assumption).
$\boldsymbol{X}\in \mathbb{R}^{N\times D}$: Set of $N$ sample customers

Problem: Proportionality Hazard Assumption: Cox regression makes an assumption that the Hazard Ratios should remain constant through time $t$. For example, for a covariate $X_1$ = "gender", say $\beta_1=1.8$. In english, it means male subscribers tend to leave the service $80\%$ more than females after a time $t$. However, this $80\%$ should hold for any time $t$.

This is usually an unreasonable constrain for many variables. But there are other methods that can incorporate variables that don’t follow the proportional hazards assumption.

stratified cox regression
pseudo-observations
cox regression with time-dependent covariates

I was just reading up on stratified cox regression. The only apparent downside here is:

The variables that are stratified need to be converted into categorical variables
The stratified categorical variables should not have too many degrees of freedom. This will lead to a LARGE number of models whose parameters need to be estimated.

Question: Is pseudo-observations similar? Does it have less/more rigid constraints? Even so, how is it's performance considering I have copious amounts of data?

Topic survival-analysis statistics machine-learning

Category Data Science

Gino_JrDataScientist · Accepted Answer · 2018年5月21日 10:19

I suggest using a model with more relaxed assumptions on proportionality of hazards. In my work I use piecewise constant hazard model, which works wonderfully. Its assumption is that the hazards are proportional in a time interval. It allows using numerical covariates with splines, and time-dependent covariates. Moreover in my experience the model is usually very well calibrated and does not overfit much.

Survival Analysis: Pseudo Observation Vs Stratified Cox Regression. Which one is better?

About