Predict time of dispatch for marketing campaign

What would be appropriate models/algorithms/strategies for predicting best individual send times for marketing campaigns based on past response timestamps?

Data:

Given for example

===========================================================
  customer campaign    campaign_time       response_time   
-----------------------------------------------------------
1   100       a     2017-01-01 06:50:01 2017-01-01 08:02:21
2   101       a     2017-01-01 06:50:01 2017-01-01 16:45:31
3   101       a     2017-01-01 06:50:01 2017-01-02 07:20:00
4   100       b     2017-01-07 06:30:21 2017-01-08 08:15:21
5   101       b     2017-01-07 06:30:21 2017-01-07 17:00:12
6   100       c     2017-01-14 06:43:55 2017-01-14 07:59:44
7   101       d     2017-01-21 14:02:01 2017-01-21 16:50:01
-----------------------------------------------------------
  • two customers 100101,
  • four past campaigns a-d.
  • with each campaign having different times of dispatch,
  • and multiple,one or no response time(s) (e.g. buying a product) for customers and campaigns

Goal:

Assuming that

  1. campaign_time can vary for 100 and 101 (personalized times of dispatch), and
  2. past response times are an indicator for when customers are most receptive for a campaign

I would like to predict the best next campaign_time ( 2017-01-28 ??:??:??) for each customer based on past response_times, so that the number of respondents per campaign is maximized.

Anyone having any experience with something similar or any ideas where to start? I'd be happy to hear some ideas.

To simplify things, I'd consider the first response_time the most valuable one (=> should be predicted) and I'd also abstract from weekdays (=> it's about predicting time 0:00-23:59, marked by the ? above); however it would be nice to have a continous prediction instead of a discretized one (like suggested here).

Topic marketing predictive-modeling machine-learning

Category Data Science


Data:

Given for example

===========================================================
  customer campaign    campaign_time       response_time   
-----------------------------------------------------------
1   100       a     2017-01-01 06:50:01 2017-01-01 08:02:21
2   101       a     2017-01-01 06:50:01 2017-01-01 16:45:31
3   101       a     2017-01-01 06:50:01 2017-01-02 07:20:00
4   100       b     2017-01-07 06:30:21 2017-01-08 08:15:21
5   101       b     2017-01-07 06:30:21 2017-01-07 17:00:12
6   100       c     2017-01-14 06:43:55 2017-01-14 07:59:44
7   101       d     2017-01-21 14:02:01 2017-01-21 16:50:01
-----------------------------------------------------------

As others have already answered, i will like to add to them few points,

  • What About Feature Engineering? We can add a lot of other features to the given data-set.

    • Week Differences.
    • Day, Month etc.
    • Time Differences as well.

And then creating group by views of the data-set as well

  • After that we can use a RF to help use see what are the feature importance of different attributes/columns etc.
  • After that you can also go for Dendograms(hierarchial clustering) to see what columns are irrelevant.
  • And the process continues with different models, doing EDA's etc..

As there are important questions to the scenario and data, I'm sharing some thoughts together with assumed answers to some questions rather than a complete solution.

First of all, in sample data there is not clear how cases when user has not responded are modeled and/or whether each campaign has been sent to all users (which in real live is never the case as user base is changing over the time). But we can safely assume that there is this information given (either all campaign has been sent to all users - for simplicity or we now which users received with campaign).

Then it can be easily imagined that some campaigns can be better suited to be sent earlier in the day, e.g. B2B services - during working hours, and some - rather in the evening, e.g. trip last minute offers. This can heavily influence the response rate for many users and we don't have any information about the campaigns themselves - no features to try to model the content of the campaign and it's impact on responsiveness. If we now that the same type of campaign has been sent always but at various time to various users, that could create a global (audience-wide) trend in responsive rate in different times of the day.

Second aspect that influence responsiveness globally is the quality of the campaign - how it is appealing to the users. That could be partially inferred observing responsiveness of users when they receive various campaigns at the same times of the day.

That poses a question what is the distribution of the times of the day each campaign separately and in comparison between them - whether there is enough data to try to infer that information.

Assuming for starters that these are similar campaigns regarding the quality and they are time-of-the-day agnostic, we can focus on each user separately.

If you look to a distribution of number of responses over the time of the day (e.g. averaged by the number of campaigns sent in the given point of time), you can spot the maximum there, which is the best one-shot candidate for the next campaign. This approach however has following caveats.

  • For the longer term, it will create a "time point bubble" for the user
  • trying to always send him on this time point. That would not reflect the change of the user preferences.

Here you could apply moving average, e.g. over fixed time window (like "last 3 months") and/or trying other data points from time to time and probably other techniques to gain more diverse and future-proof strategy. - The maximum time point might have been an exceptional behaviour while there is much more evidence for a bit smaller number of responses but supported by more time points in other time range.

To take into account various qualities of campaigns, their rating could be created by comparing responsiveness of users to which more than one campaign has been sent in the same time point. Then weigh number of responses by inverted rating of the campaign on the distribution of number of responses for the given user.


I might suggest a change to your theoretical setup:

It sounds like you are trying to maximize an equation f(t), where t is the time of dispatch, and f(t) is the risk adjusted chance of purchasing (as you note it can be 0-many).

In that case, it is likely better to use a setup like this:

===========================================================
  customer campaign    campaign_time       total_responses_within_1week   
-----------------------------------------------------------
1   100       a     2017-01-01 06:50:01       0
2   101       a     2017-01-01 06:50:01       3
3   101       a     2017-01-01 06:50:01       2
4   100       b     2017-01-07 06:30:21       1
5   101       b     2017-01-07 06:30:21       1
6   100       c     2017-01-14 06:43:55       1
7   101       d     2017-01-21 14:02:01       1
-----------------------------------------------------------

Unless you have specific internal data or journal research that directly indicates a relationship between speed of response and likelihood/number of purchases, I might avoid including it.

Now, if you need to estimate time of response for issues like labor or inventory, you could use a probabilistic model, but I'll avoid making specific recommendations there, as my relative strength is in regression models. You'd want the function to output a probabilistic curve over a given time period, and then sum all of those curves together (e.g. to estimate that during Weds morning, you will have 80-109 orders, so need 5 customer service reps).

If your goal is purely response frequency (within a relevant time), time of response doesn't need to be modeled, however.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.