How can I calculate total days past due between billing events?
I am dealing with a dataframe with subscription events partitioned by username, subscription status, and relative timestamps. For each of the dates, there are changes in time when the subscription becomes past due and renews as such:
username | subscription_events_name | subscription_events_timestamp |
---|---|---|
A | subscription_charged_unsuccess | 2021-01-08 |
A | subscription_past_due | 2021-01-08 |
A | subscription_past_due | 2021-01-15 |
A | subscription_charged_successfully | 2021-01-16 |
A | subscription_renew | 2021-01-16 |
Say a customer enters past due status, and 15 days later their subscription is billed in full and they enter an active status. I want to store this value in a new column for days_past_due
:
username | subscription_events_name | subscription_events_timestamp | days_past_due |
---|---|---|---|
A | subscription_charged_unsuccess | 2021-01-08 | 0 |
A | subscription_past_due | 2021-01-08 | 0 |
A | subscription_past_due | 2021-01-15 | 7 |
A | subscription_charged_successfully | 2021-01-16 | 0 |
A | subscription_renew | 2021-01-16 | 0 |
I created an expanding window for this, but it only counts values of subscription_events_name
. How can I modify this code or produce new code to achieve this?
df['days_past_due'] = df.groupby([
'username',
'subscription_events_name')[['subscription_events_timestamp']].transform(lambda x: x.expanding().count())
df.sort_values(by=['subscription_events_timestamp'])
Topic pandas python data-cleaning
Category Data Science