How can we make forecasts from stationary data

I'm confused about the concept of stationarity. Most definitions require the mean and Variance to be constant 'over any interval'. This statement confuses me, if any interval should have the same mean mean and variance then I'll select a time-strip as narrow as possible, say 1 day where the graph is on a high and then another 1 day where the graph is on a low, then the mean is obviously different.

Say I take means over the green and blue bounds, they are going to be different, how is this a stationary time series then ? Moreover if trends and seasonality are not to be there in stationary time series data then what do models that require stationary data predict from, trends and seasonality are 'patterns' in the data, if they are not there than what is the basis of prediction, in that case how is stationary time series data of any use ?

Topic linear-regression arima time-series

Category Data Science


One can say that the mean is approximately constant in the ranges you point out and any discrepancy is due to random noise.

This does not invalidate stationarity assumption.

You cannot take arbitrarily small intervals to take sample mean and expect it to be constant. This degenarates to taking any single sample and expecting it to be the same constant value.

So your incorrect narrow definition would mean that only constant signals ie $y(t) = c$ are really stochastic and wide-sense stationary. Which is not what wide-sense stationarity means.


Stationary time series refers to the stochastic process of generating the time series data, not necessarily the actual sampled data values themselves.

A strict stationary time series has constant mean and constant variance for a stochastic process of data generation. There are many other definitions of stationarity.

In your graph, it looks like the data closely resembles having been generated from a constant mean $\mu=0$ and constant standard deviation $\sigma<5$ stochastic process, so it's probably a stationary time series.

The interval is the duration of the single sweep of the time series generation process, which is one instance of time series data. It is not a problem to limit the interval to smaller region.

The problem is when you don't know the original time series generation process, and have to deduce it from one or more generated time series data. If you had only looked at two data point values, it would be almost impossible to deduce stationarity of the process with any certainty. With longer interval, and possibly multiple generations of the time series in the same interval (impossible for real-time time series since you can't go back in time), one can deduce stationarity with more confidence.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.