What is best practice to feature engineer from prior event counts?

Say for example I am building a model to predict a customer churn event from Spotify, with my target being whether a customer churns in the next 90 days.

One feature I might expect could be predictive of this event is customers checking their billing statements online - so I might engineer features for each customer on each training date to encode the information of how many times they have checked their billing statements.

For example, I might create a feature CHECKBILL_CNT_0_10 which is a count of how many times this customer has checked their online bill in the last 10 days, with many of these such features across different time ranges.

I have seen two different styles of how data scientists do this:

  1. CHECKBILL_0_10, CHECKBILL_0_30, CHECKBILL_0_90 ...
  2. CHECKBILL_0_10, CHECKBILL_10_30, CHECKBILL_30_90 ...

Both technically encode the same information; however, I'm wondering if one of these options offers advantages over the other? I'm inclined to think that option 2 would be preferable since the features would be less correlated, therefore the model might learn more easily, but this is speculative.

Topic time feature-engineering

Category Data Science


You may want to try both options out and see which is better. Feature engineering I think is more like a trial and error (iterative) process.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.