Forecasting non-negative sparse time-series data

Question

Forecasting non-negative sparse time-series data

Bernardo Aflalo

2021年11月19日 04:15

I have a time-series dataset (daily frequency) representing the sales of a product to a customer over time. The sales is represented as the following:

$$[0, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 17, 0, 0, 0, 0, 9, 0, ...]$$

in which each number represents the sales of the product in a day.

The problem is that time-series forecast methods (ARMA, HoltWinters) work well for "continuous" and "smooth" data, but is not producing good results in this case.

I want to make a forecast of that series, with attention to 2 points: (1) assuring non-negative values and (2) sparse/ non-continuous data. Anyone knows how to approach this problem? What methods/ technique?

Thanks!

Topic forecast time-series

Category Data Science

JohnzW · Accepted Answer · 2019年7月14日 14:02

In this type of data, information comes from 2 places

Time interval between sales $T_i$: time interval between $Sale_{i-1}$ and $Sale_i$
Amount of $Sale_i$: $Y_i$

Similar to a previous answer, I will suggest starting by checking the time interval for any potential patterns. Second is to check the relationship between the amount of sales $Y_i$ with $Y_{i-1}$ and $T_i$, $T_{i-1}$. In the most general case, these 2 should be correlated. Based on this, you can decide whether to model them independently or jointly. One commonly used model is state space model. The basic idea here is the decompose the sparse ts into sub-components which are not sparse and easier for modeling.

One general model could be:

$$T_i = g(T_{i-1}, Y_{i-1}) + e_i,%$$

$$Y_i = f(Y_{i-1}, T_i) + h_i$$

lmjohns3 · Accepted Answer · 2017年3月3日 17:37

I have two ideas here, maybe they will be helpful.

Idea 1: Model time between events

You might think of your data as being generated by two processes: the first is a distribution over time intervals, and the second is a distribution over purchase amounts. So to model your data you could create one distribution (gaussian?) over the nonzero values in your dataset, and another over the lengths of sequences of zeros (poisson?).

Idea 2: Model customer inventory

Even though the sales events in your dataset are sparse, you could spend a little time to come up with a model of why the customer is making purchases when they do. In one possible model, the customer has an inventory that shrinks over time, and they make purchases when their inventory crosses some minimum threshold. You could use your sales data to fit the slope (for linear shrinkage) or rate (for exponential shrinkage) as well as the threshold.

This could get arbitrarily complex, since the customer under this model might have different thresholds or shrink rates at different times ... but for starters it could be a useful approach to get a sense of things.

Forecasting non-negative sparse time-series data

About