ML model to forecast time series data

This question has three sub-parts, answering each of which probably doesn't require huge text. I hope that is okay.

I'm trying to understand time series prediction using ML. I have the target variable $y_t$, and suppose two other variables $x_t,z_t$ (e.g. if $y_t$ were the demand of an item, $x_t$ could be type of item or price of item, etc.). Also, let's say I'm using a random forest model because I've read it generally does okay out of the box.

i) From my understanding, if I include $y_{t-1}$ as a predictor, the model may just learn to predict $y_t=y_{t-1}$, for example there is autocorrelation with lag $1$. Given that, is it a bad idea to include $y_{t-1}$ as a feature?

ii) Each of the predictors $x_t,z_t$ may have one or the other typical time series characteristics, like non-stationarity, autocorrelation or seasonality. Is there some special method I have to follow or transformation (to the predictor) that I have to do if any of the predictors has any special characteristic?

iii) Typically, what are some best practices to go about such forecasting? My current thought is: use $x_t,z_t$ as predictors without transformation. Use ARIMA with grid searched parameters to fit the training data and validate. Use that as baseline. Finally, use random forest to predict the differenced time series $y_t-y_{t-1}$ using $x_{t-1},z_{t-1}$ as predictors and compare to baseline. Am I missing anything here or should I consider something additional?

Thanks in advance!

Topic arima random-forest time-series

Category Data Science


i) Times series learn to predict values based on many past sequences. Like other ML models, they use training and validation datasets. Consequently, y(t) is your objective data. You can either predict t+1 or several steps in the future, but in general, the further you predict, the worse is the forecast.

ii) It depends on the model you use. LSTMs are designed to predict many different dynamic behaviors. On the other hand, ARIMA is a pure statistical forecasting model, and is more limited in time range than LSTM and cannot always detect seasonality (use SARIMA instead). RNNs are well suited for small data sets. Many forecasting models are sensitive to noise, reducing it could improve the results. I recommend to study their publication, they are very interesting. For instance, LSTM uses about 10 different learning algorithms. https://www.researchgate.net/publication/13853244_Long_Short-term_Memory

iii) Use models that are easy to implement first, like Random Forest. Then increase the complexity using combined models like SARIMA, GRU or LSTM. Depending on your data, some models will perform best than other, so it is a good idea to do a test bench to monitor several forecasting models.

Here is a good notebook that explains time series using RNN, LSTM and GRU: https://github.com/ageron/handson-ml2/blob/master/15_processing_sequences_using_rnns_and_cnns.ipynb

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.