Predictive modeling when output affects future input

Assume I have a model which predicts the outcome of the number of icecreams sold in a store.

The model is trained on data for the last 5 years while keeping the last year as a validation set and has produced very good results.

We now put the model into production such that the CFO can create an estimate for the upcoming year's budget. The CFO now look at the prediction for May, say 2000 ice creams, and thinks Ooh... I was hoping for some more sale in May. I'll go 4000 thus he orders some more advertising, introduces new flavors, etc. and reaches the 4000 sold ice cream at the end of May as he was hoping for.

On the first of June, we talk to the CFO to evaluate the model after the first 6 months, and we see that our prediction in May is off by 100%!

This spike can be explained with the increased advertising etc., and all the other days the model has done really well, but if the CFO starts tweaking the advertising, flavors, etc. each day to hit the budget, how will we ever be able to test, if our model is indeed good in production/real-world? And how will we be able to re-train the model, since the first 5 years sale is without any human influence whereas, after a year, the sale has been influenced by advertising, etc., thus the spike in May is not natural but is due to some exogenous variable we are not able to incorporate (e.g we don't know the CFO's budget)?

Topic concept-drift bias methodology predictive-modeling machine-learning

Category Data Science


The simpler, more practical, and more business-oriented way to go would be to include advertising in your model. That would allow you to :

  • Change your prediction accordingly, then your measured performance. The key is to be transparent about it, pedagogical even. Basically, in your example, you had 6 months to review your prediction and inform the CFO so that he doesn't think your model is off. If the problem is the absence of communication on his part, you can formulate it that way: more communication from his side would allow for better predictions. If he is not willing to do so that's on him.

  • Help evaluate the advertising needed. With a simple model, you could open some interesting discussions with your CFO. Basically, if you predict 2000 and he wants to sell 4000 you can try to get what change in the advertising feature would lead to a 2000 sale increase. It would allow you to discuss when advertising is worth it and it might open the discussion about the first point (prediction update).


I am afraid that such situations are fundamentally inherent in predicting/forecasting contexts; quoting from the very recent paper by Taleb et al., On single point forecasts for fat-tailed variables (open access, para 3.7):

3.7. Forecasts can result in adjustments that make forecasts less accurate

It is obvious that if forecasts lead to adjustments, and responses that affect the studied phenomenon, then one can no longer judge these forecasts on their subsequent accuracy.

So, other than communicating this clearly beforehand and reaching an agreement on how the predictions will be assessed in the presence of such adjustments, there is not much else you can do from a modelling or methodology perspective. The advice suggested further in the same paragraph quoted above:

In that sense a forecast can be a warning of the style “if you do not act, these are the costs”.

can form the basis of such a communication and agreement.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.