Churn prediction model doesn't predict good on real data
I am working currently on churn prediction problem.
As an input I use data from date warehouse for a period 082016 - 032021(one row per month for each customer).
Based on this data I have created a time window of 18 months, where I track customer behaviour(feature engineering).
Based on features, I predict churn in 4 months in the future 122020-032021.
As a model I use lightGBM with the following parameters:
parameters = {
'objective': 'binary',
'metric': 'auc',
'is_unbalance': 'true',
'boosting': 'gbdt',
'num_leaves': 31,
'feature_fraction': 0.5,
'bagging_fraction': 0.5,
'bagging_freq': 20,
'learning_rate': 0.05,
'verbose': 0
and get the following as classification report based on test data (training/test split 80/20%):
precision recall f1-score support
0 0.96 0.93 0.95 48008
1 0.68 0.80 0.73 8745
accuracy 0.91 56753
macro avg 0.82 0.86 0.84 56753
weighted avg 0.92 0.91 0.91 56753
In real example I use period 082016-032021 for creating features, and predict churn for next 4 months (042021-072021).
In the last step I create dataset from clients who were active in a month 03/2021 and who have churned in period of 4 months (042021-072021), about 1700 customers.
When I compare predicted values (what says the model), who will churn and real values for churned customers, the model has 44% accuracy. The model can correctly predict only 844 from 1700 customers.
I can not find the reason for such a huge difference between test data and using model in real prediction. Does anybody have the similar experience?
Tnx for useful suggestions!
Here is the number of features and observations:
293552 rows × 152 columns
number of not churners - 242385
number of churners - 51167
I will try cross validation and suggested metrics for churn.
One more question:
What is the best method to determine the threshold in this situation? At the moment, I use exactly what you said: 50%+ = churn, 50% = not churn.
