survival-analysis

Using survival analysis models with uncensored data for time-to-event prediction

Mykola Zotko

2022年6月4日 10:12

Are there any advantages of using survival analysis models like Cox’s proportional hazard model with uncensored data over simple linear regression or other classic ML models? I have data with recurrent events and I try to predict the time of the next event. Data contains about 2000 different subjects and about 60 events per subject. The percentage of censored data (the last event of each subject) is small, and I don't think it plays a big role in the prediction.

Topic: survival-analysis time-series machine-learning

Category: Data Science

What kind of algorithm should I use to build ML model that can predict just next reoccurence of an event in the future (at irregular time interval)?

S. Joshi

2022年5月14日 11:00

I'm quite new to machine learning and statistics. I've a dataset from some ecommerce sale's history. It's almost 2k instances, and features include personId (string), productCategory (string/discrete), amountPaid (float/continuous), purchaseTime (string/Time(DD/MM/YYYY)). Person can purchase product at any time (irregular time interval so I can't use time series analysis, I guess). I want to know when will the same person (attr with person Id) make just next purchase in a category (attr with productCategory). What ML model should I use for …

Topic: forecasting survival-analysis classification time-series machine-learning

Category: Data Science

Survival analisys on LTV for subscription business

Kassem Hussein

2022年5月5日 13:33

I'm trying to predict what's the expected LTV of a subscriber, since monthly revenue and costs are almost constant I need only to predict the survival function, where the terminal event would the subscription cancelation request. I proposed the following formula to estimate LTV: $LTV = (Membership - Cost)*mean\ residual\ life(x)$ where: $mean\ residual\ life(x)=E(X-x|X>x)= \frac{\int_{x}^{\infty}S(t)dt}{S(x)}$ In my case I have data of all subscribers over the last 10 years (more than 3 million data points where 1 million are …

Topic: survival-analysis

Category: Data Science

Survival Analysis: Pseudo Observation Vs Stratified Cox Regression. Which one is better?

Ajay H

2022年4月23日 02:00

I've been looking into the Cox Regression method for Survival Analysis in Churn Prediction. Cox regression will allow us to determine the probability that a subscriber will unsubscribe after a time $t$, defined by the hazard rate: $$ h(t \lvert X_i ) = h_0(t)exp\big( \boldsymbol{\beta} ^T\boldsymbol{X}_{i} \big) $$ Where $h_0(t)$: Baseline Hazard is a prior Probability that any customer churns at time t when all influencing factors are 0. $\boldsymbol{\beta} \in \mathbb{R}^D$: Exponent of each Coefficient gives us a Hazard …

Topic: survival-analysis statistics machine-learning

Category: Data Science

How is the survival function in Kaplan Meir is affected if there is no censoring?

StrugglingResearcher

2022年4月1日 06:46

Basically what the question above asks. KM survival function considers censored data untill it is censored. But, how will the change in each point of time would be affected if we assume from the start that there is no censoring at all in the data? Thanks in advance!

Topic: survival-analysis

Category: Data Science

How do I predict survival curves using xgboost?

Iyar Lin

2022年3月31日 14:07

The xgboost package enables survival modeling using parameter arguments: objective = "survival:cox" and eval_metric = "cox-nloglik". The predict method for the resulting model only outputs risk scores (same as type = "risk" in the survival::coxph function in r). How do I use xgboost to predict entire survival curves?

Topic: xgboost survival-analysis

Category: Data Science

How to get the survival duration prediction for each individual in the data by using the Kaplan-Meier method?

Kristada673

2022年2月23日 18:07

I am trying to learn how to use the Kaplan-Meier survival estimator model in the lifelines package. The documentation says that the KaplanMeierFitter.fit function returns "a modified self, with new properties like 'survival_function_'." I checked what the survival_function_'s contents are - it seems to contain the average survival probability for all the players in the dataset at each time time interval. For example, in my dataset, there are 66 months and about 250,000 players (i.e., individuals whose death event we …

Topic: data-analysis survival-analysis python predictive-modeling machine-learning

Category: Data Science

Time Series Classification with multiple rows per date

NuValue

2022年2月8日 23:02

I have a time series data set with the lifecycle of 9000 different B2B sales leads. What I call lifecycle consists of a dataset with one registry per day for every different sales Lead identifier with 4 predictive variables (DAYS_SINCE_START, LEAD_ID, CUSTOMER_INTEREST, MARKET, TYPE_SERVICE) and one response variable (OUTCOME). The response variable outcome can have 2 different values: Won (1) or Lost (0). A mock example of the data frame would be the following: As it can be seen, some …

Topic: lstm survival-analysis classification time-series

Category: Data Science

Memory issues for AalenAdditiveFitter in Lifelines packages in Python

Protik Nag

2022年2月8日 14:40

We are working on a problem related to survival analysis. We have already implemented Cox Proportional-Hazard Model and Accelerated Failure Time algorithm. Now we want to see how the covariates change over time. So we decided to implement AalenAdditiveFitter from the lifelines library. Here is a dummy data presented. Data shape is (1341799, 4). Gender Disability_level Time_to_event Event 1 Female Mild 50 0 2 Male Moderate 70 1 3 Male Severe . . . 1341799 Female Mild 45 1 Now, …

Topic: survival-analysis python machine-learning

Category: Data Science

Correctly plotting CCDF of network one-way delay

eemilk

2022年1月3日 13:15

I have a histogram of values of test setup network. Values are from iperf 2.1.6. I send stream of data and get how many packets are in a bin of microseconds. bin(w=100us) I lose some packets sometimes. Question: I am wondering how to correctly take in account the lost packets when plotting CCDF For now I am calculating Y-axis values with: (lost_packets + cum_sum(x))/total_packets actual code delay_data = np.random.uniform(low=5, high=62.4, size=(110,)) count_data = np.random.uniform(low=1, high=800, size=(110,)) df = pd.DataFrame({"count_bin": count_data, …

Topic: survival-analysis python

Category: Data Science

Late entry in Survival Analysis

gbarel

2021年12月5日 15:56

I would like to ask how to deal with new entries of individuals in Survival Analysis. I have a study about the time to event of several individuals who suffer from a disease. The study starts on a specified date (let's assume 1/1/2019). The individuals on this date are 50. The study lasts 6 months. In these 6 months, more individuals must be included but they were not present on the starting date. I have not any left censoring because, …

Topic: survival-analysis

Category: Data Science

Inference over a fixed term - what analysis am I doing here?

SupplyRobot

2021年9月28日 17:49

I have a fixed term of, say, one year. At the end of the term there is an observation of true / false, say a customer either renews or cancels their subscription. This decision is probably based on the occurrence of certain events, say "how many times did they use the service?", and maybe even the specific timing within this term. At the beginning of the term (say, day 1) I don't have any behavioral information, so I can just …

Topic: bayesian survival-analysis

Category: Data Science

Building a custom scoring function to find mean time-dependent AUC

green_table

2021年8月28日 01:59

I’m working on a survival analysis to predict 1-year mortality. I’m trying to build a custom score function that maximizes mean time-dependent AUC. Here is a description of the time-dependent AUC metric from the sckikit-survival package. This custom score function would be used in the GridSearchCV to select hyperparamters. The challenge is that the time-dependent AUC metric requires calling on survival_train. Is it possible to call survival_train within cross fold validation? Here is a layout of the code: # Instantiate …

Topic: survival-analysis cross-validation python machine-learning

Category: Data Science

Like Time-To-Event analysis, but looking at the timing of events that do or do not happen on a binary outcome

SupplyRobot

2021年8月2日 17:18

I have a problem where every observation has a binary outcome that occurs at the end of a fixed period, and the predictor variables describe a few types of event that either happen on some day within that period or do not happen at all. For example: Outcome Days Until First Phone Call Days Unit Second Phone Call TRUE 3 14 FALSE 25 63 FALSE 16 NA Of course I can convert the predictor columns to binary and use logistic …

Topic: survival-analysis regression

Category: Data Science

What is the best way to model survival when the hazard rate decreases over time?

JJ Levine

2021年7月14日 22:05

The standard survival analysis model - for example the model which forms the basis for the proportional hazards model - assumes the hazard rate is constant. In many applications this would be the exception rather than the rule. What parametric model would be appropriate for data such as this: % retention 70% 80% 85% 90% 90%

Topic: parameter-estimation churn survival-analysis

Category: Data Science

What is the typical things in Data that i have to look for, when implementing Survival Models using Machine Learning?

AvidJoe

2021年6月17日 19:22

Problem Scenario I am working on an industry specific problem focussed on predicting the failure of a seal/gasket in the given time interval(T) in a high-pressure-compression environment. Whenever this seal/gasket is broken there is loss of pressure and a leak. This leak is extremely dangerous. The gas in question is H2 and this makes things even scarier. The specific problem would be this, "Predict the likelihood of this Seal Surviving past a time Ti provided that the event has not …

Topic: survival-analysis deep-learning predictive-modeling machine-learning

Category: Data Science

CoxPH model with Frailty and L1 regularization

Redratz

2021年6月2日 10:34

This question stems from an approach proposed by Dr. Silverman, "Predicting Horse Race winners through A Regularized Conditional Logistic Regression with Frailty." In this paper, he proposes a modified Cox Proportional Hazard model including a frailty parameter taken from Muriel Gillick's article, "Guest Editorial: Pinning Down Frailty." The loglikelihood with frailty has the form: Where: $ X^{w}_{rh} $ = characteristics of the horse that won race r $\beta$ are the parameters to be estimated $w^{w}_{rh}$ is the frailty indicator of …

Topic: forecasting survival-analysis rstudio logistic-regression python

Category: Data Science

Survival analysis to estimate kanban tasks completion times

Sharath

2021年3月6日 00:24

I am working on a problem to estimate task completion time in kanban (project management tool). While doing EDA, I looked at tasks that are either done or cancelled. In this case, I defined the completion time as the time taken from task creation to done/cancelled. I noticed I am running into an issue with that definition. I am disregarding tasks that have not been done yet. If we think of "task = done" as "event = 1", this is …

Topic: time survival-analysis r machine-learning

Category: Data Science

How to do prediction on survival data, using Random Forest

Seydou GORO

2021年1月23日 15:21

I should make prediction on survival data, using the random Forest method. My question is: should I follow the same approach as in logistic regression? taking into account only the status variable or whether I should take into account the delay to the event? Are there any specific R functions for survival analysis other than randomForest? Or could I use this function for survival analysis as well? I've seen a function called ranger() that seems to do random forest on …

Topic: prediction survival-analysis random-forest

Category: Data Science

analyze the effect of some new changes to business rules on customers retention and sales

NewbietoPython

2021年1月7日 18:24

I am trying to analyze the effect of a particular business rule on customer behavior. Background: I have two call centers operating in my company. One is an in-house call center and the other one is a third party. The incoming calls are handled by these two call centers based on some rules. 2 months before we changed some operational rules after which all the calls will be routed to call center A and then if not attended to call …

Topic: data-analysis statsmodels survival-analysis logistic-regression statistics

Category: Data Science

About