ML model to forecast time series data
This question has three sub-parts, answering each of which probably doesn't require huge text. I hope that is okay.
I'm trying to understand time series prediction using ML. I have the target variable $y_t$, and suppose two other variables $x_t,z_t$ (e.g. if $y_t$ were the demand of an item, $x_t$ could be type of item or price of item, etc.). Also, let's say I'm using a random forest model because I've read it generally does okay out of the box.
i) From my understanding, if I include $y_{t-1}$ as a predictor, the model may just learn to predict $y_t=y_{t-1}$, for example there is autocorrelation with lag $1$. Given that, is it a bad idea to include $y_{t-1}$ as a feature?
ii) Each of the predictors $x_t,z_t$ may have one or the other typical time series characteristics, like non-stationarity, autocorrelation or seasonality. Is there some special method I have to follow or transformation (to the predictor) that I have to do if any of the predictors has any special characteristic?
iii) Typically, what are some best practices to go about such forecasting? My current thought is: use $x_t,z_t$ as predictors without transformation. Use ARIMA with grid searched parameters to fit the training data and validate. Use that as baseline. Finally, use random forest to predict the differenced time series $y_t-y_{t-1}$ using $x_{t-1},z_{t-1}$ as predictors and compare to baseline. Am I missing anything here or should I consider something additional?
Thanks in advance!
Topic arima random-forest time-series
Category Data Science