Forecasting non-negative sparse time-series data

I have a time-series dataset (daily frequency) representing the sales of a product to a customer over time. The sales is represented as the following:

$$[0, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 17, 0, 0, 0, 0, 9, 0, ...]$$

in which each number represents the sales of the product in a day.

The problem is that time-series forecast methods (ARMA, HoltWinters) work well for "continuous" and "smooth" data, but is not producing good results in this case.

I want to make a forecast of that series, with attention to 2 points: (1) assuring non-negative values and (2) sparse/ non-continuous data. Anyone knows how to approach this problem? What methods/ technique?

Thanks!

Topic forecast time-series

Category Data Science


In this type of data, information comes from 2 places

  1. Time interval between sales $T_i$: time interval between $Sale_{i-1}$ and $Sale_i$
  2. Amount of $Sale_i$: $Y_i$

Similar to a previous answer, I will suggest starting by checking the time interval for any potential patterns. Second is to check the relationship between the amount of sales $Y_i$ with $Y_{i-1}$ and $T_i$, $T_{i-1}$. In the most general case, these 2 should be correlated. Based on this, you can decide whether to model them independently or jointly. One commonly used model is state space model. The basic idea here is the decompose the sparse ts into sub-components which are not sparse and easier for modeling.

One general model could be:

$$T_i = g(T_{i-1}, Y_{i-1}) + e_i,%$$

$$Y_i = f(Y_{i-1}, T_i) + h_i$$


I have two ideas here, maybe they will be helpful.

Idea 1: Model time between events

You might think of your data as being generated by two processes: the first is a distribution over time intervals, and the second is a distribution over purchase amounts. So to model your data you could create one distribution (gaussian?) over the nonzero values in your dataset, and another over the lengths of sequences of zeros (poisson?).

Idea 2: Model customer inventory

Even though the sales events in your dataset are sparse, you could spend a little time to come up with a model of why the customer is making purchases when they do. In one possible model, the customer has an inventory that shrinks over time, and they make purchases when their inventory crosses some minimum threshold. You could use your sales data to fit the slope (for linear shrinkage) or rate (for exponential shrinkage) as well as the threshold.

This could get arbitrarily complex, since the customer under this model might have different thresholds or shrink rates at different times ... but for starters it could be a useful approach to get a sense of things.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.