What is best practice to feature engineer from prior event counts?

Question

What is best practice to feature engineer from prior event counts?

user3555243

2020年11月28日 09:00

Say for example I am building a model to predict a customer churn event from Spotify, with my target being whether a customer churns in the next 90 days.

One feature I might expect could be predictive of this event is customers checking their billing statements online - so I might engineer features for each customer on each training date to encode the information of how many times they have checked their billing statements.

For example, I might create a feature CHECKBILL_CNT_0_10 which is a count of how many times this customer has checked their online bill in the last 10 days, with many of these such features across different time ranges.

I have seen two different styles of how data scientists do this:

CHECKBILL_0_10, CHECKBILL_0_30, CHECKBILL_0_90 ...
CHECKBILL_0_10, CHECKBILL_10_30, CHECKBILL_30_90 ...

Both technically encode the same information; however, I'm wondering if one of these options offers advantages over the other? I'm inclined to think that option 2 would be preferable since the features would be less correlated, therefore the model might learn more easily, but this is speculative.

Topic time feature-engineering

Category Data Science

Gozie · Accepted Answer · 2020年10月29日 06:54

1

Gozie answered at 2020年10月29日 06:54

You may want to try both options out and see which is better. Feature engineering I think is more like a trial and error (iterative) process.

What is best practice to feature engineer from prior event counts?

About