Logistic Regression with Heterogenous Historical Clusters of Customer Usage
I would like to train a churn model based on daily customer usage of a service - among other features - to predict if they are likely to churn.
The problem I am facing is that I have historical usage data that vary from a customer to another based on his contract date : some have been subscribers for months, others only for weeks. This means the available historical data varies for each customer.
This dataset makes it difficult to just have the usual dimensions available for feature engineering, as there is an additional dynamic/time aspect to the dataset.
Static features are straightforward to model : you have your usual matrix of features and off you go training.
But in my case, I would also like to integrate in my model the historical behavior of each client in some way.
For example : has the attrition of the service usage been abrupt or progressive during the last N months, N depending on the age of the contract. This is the kind of information I would like to integrate in the model.
Any suggestion would be greatly appreciated.
Topic churn time-series machine-learning
Category Data Science