Survival analysis to estimate kanban tasks completion times

I am working on a problem to estimate task completion time in kanban (project management tool). While doing EDA, I looked at tasks that are either done or cancelled. In this case, I defined the completion time as the time taken from task creation to done/cancelled.

I noticed I am running into an issue with that definition. I am disregarding tasks that have not been done yet. If we think of task = done as event = 1, this is like throwing away observations with event = 0 in survival analysis, giving us a biased result.

  • How should I handle this?
  • I would also like to get some inputs on how should I approach done vs cancelled?

Topic time survival-analysis r machine-learning

Category Data Science


It's a matter of defining exactly which problem you want to solve, and there might be many variants:

  • If the goal is really to estimate "time completion", then imho you should use only completed tasks, since the other tasks haven't been "completed". Note that in this case you're counting time actually spent on the task.
  • If the goal is to estimate "time of solving the task", whether by completing it or cancelling it, then you're counting the duration between the time the task was initialized and the time it was either completed or cancelled. Note that in this case the duration may include time spent on other tasks.

In both cases above, I don't see any proper way to include tasks which are still pending. My idea for these cases would be to calculate a different statistic, something like "rate of completed tasks after X days" for instance.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.