Why are predictions from my LSTM Neural Network lagging behind true values?

I am running an LSTM neural network in R using the keras package, in an attempt to do time series prediction of Bitcoin. The issue I'm running into is that while my predicted values seem to be reasonable, for some reason, they are lagging or behind the true values. Right below is some of my code, and farther down I have some graphs to show you what I mean. My model code:

batch_size = 2              


model - keras_model_sequential()

model%%
  layer_lstm(units=22, 
             batch_input_shape = c(batch_size, 1, 22), use_bias = TRUE, stateful = TRUE,
              return_sequences = TRUE) %%
  layer_lstm(units=16, batch_input_shape = c(batch_size, 1, 22), stateful = TRUE, return_sequences = TRUE) %%
  layer_dense(units=1)
model %% compile(
  loss = 'mean_absolute_error',
  optimizer = optimizer_adam(lr= 0.00004, decay = 0.000004),  
  metrics = c('mean_absolute_error')
)
summary(model)

Epochs - 50
for (i in 1:Epochs){
  print(i)
  model %% fit(x_train, y_train, epochs=1, batch_size=batch_size, verbose=1, shuffle=FALSE)
  model %% reset_states()
}

So in case that's not clear, I have a neural network with 1 middle layer - I have 22 units in the input layer (equal to my number of variables), 16 units in middle layer and my one output layer.

Here is a graph of the training data fit (blue is fit, red are true values):

I am predicting the Bitcoin price 24 hours ahead. I have hourly data, so I'm doing this prediction by simply shifting the Bitcoin price column of my data 24 steps back, so I'm matching past predictor conditions with future outcome.

From the picture above, you can see that the training fit is very strong. However, take a look at my out-of-sample predictions vs. true values (again, blue line is model prediction, red line is true value):

At first glance, it's really not shabby. However, if you take a closer look (and it becomes VERY obvious when I zoom in to smaller time scales), the predicted blue line often lags behind the true red line:

The odd thing to me is that this is not a consistent problem. If you look at some of the movements towards the right of the graph, the model gets it on-target (no lag). Additionally, from zooming in and really looking carefully, I have found that the apparent lag itself is not consistent in magnitude, ranging from around 14 hours to sometimes 22 hours (meaning the prediction is barely usable, since it's predicting 24 hours ahead, but lagging the true value by 22 hours, so I'm really only getting 2 hours ahead of real prediction).

I have tried increasing my batch size (to 5, 10, 30), which doesn't make the problem any better (might even make it worse). I tried increasing the size of my middle neuron layer (to 20, 30, 44), which also didn't fix the issue. Having the loss function as Mean Absolute Error SEEMS to work better than Mean Squared Error, but what you're looking at is already the MAE version, so the problem obviously persists.

Around half of my inputs into the neural network model are lagged values of the Bitcoin price (BTC price 24 hours ago, 25 hours ago etc.), so I thought maybe the problem was that my model was simply grabbing those past values and replicating them because the model couldn't find any other meaningful connections to my predictors. However,

  1. You can see that the problem doesn't exist in the training dataset fit, so I don't think this is an issue of my model only using past price values as its best guess.
  2. I tried changing the past lags that were used (for example, instead of using the 24 hour ago value, I used the 30 hour ago value). However, this didn't make a difference, so I'm pretty confident now that the issue is not that my model is relying solely on past price values..

As a result, I really have no idea where this gap is coming from.

Any advice, suggestions or tips would be appreciated on how I could deal with this odd gap. Thank you very much!

EDIT (please read entirely, important): In order to once-and-for-all test the idea that it's the lagged time series inputs causing the issue, I just ran the neural network with all past values of the price removed. As in, ALL of the inputs were exogenous variables, no time-series lagged values, and while it's a bit hard to tell (because the predictions are messier), the problem appears to persist. Take a look:

I think this pretty much definitively proves that the lag is not coming from past price values being replicated. HOWEVER, I looked at the training data fit for the model with no time-series inputs, and it is obvious that it ALSO has an offset/lag. Example:

One more thing I need to mention. When I run this neural network on the same data but without offsetting the predictors from outcomes, there is no problem. That's to say, when I run the data without shifting the Bitcoin price column back, meaning my network is matching current conditions to current price, this prediction offset does not exist. In fact, I've been playing around with this offset (so, trying to predict 12 hours ahead, 24 hours ahead, 48 and 72 hours ahead) and it seems like changing this changes the lag in prediction. I have no idea why. When I change it to predicting 72 hours ahead, the prediction lag isn't exactly 72 hours (just like it isn't exactly 24 hours when I'm predicting that far ahead). However, the prediction lag noticeably increases/decreases when I increase/decrease how far ahead I'm trying to predict..

EDIT 2: I am now quite certain that I'm making some mistake in my data processing. Since I noticed that the prediction offset increases/decreases with how far ahead I'm trying to predict, I tried making the value for how many hours ahead I want to predict negative. (-20, to be exact). And here's what I saw now:

Sure enough, the predictions are now significantly ahead of the actual values. As a result, I think I'm making some kind of basic data processing error. As of right now, I haven't found the error yet though.

Topic lstm prediction rnn time-series

Category Data Science


I've been fiddling a bit with LSTMs myself to predict windspeeds using inertial drone data and some of my plots had a similar "offset" to yours. Have you scaled your inputs using a MinMax or Standard scalar? I've also have a surprisingly good amount of success implementing a KNN algorithm to predict the windspeeds with mean bias errors oftentimes lower than those found by LSTM.


Welcome to the site.

I think you were right that the prediction lags behind the true value because the series is autoregressive (i.e. a strong way to predict tomorrow’s value is “It will be about the same as today”). Your model therefore corrects itself with the new information when it misses a big jump. In other words, if the price jumps one day and your model does not predict that, it has learned to take into account the higher price for the prediction of the next day’s price.

In response to your numbered points above:

  1. Is this based on eyeballing the data? Can you show us any results that prove the model behaves differently during training?
  2. Are you sure that when you change the lag to 30 hours, the lag in prediction doesn’t just change to a 30-hour lag, as we’d expect from an autoregressive model?

I recommend using another model as a baseline (e.g. Facebook Prophet) and seeing whether the values your model is producing that are significantly different to that of the baseline model are more correct. This gives you a more rigorous alternative to troubleshooting your data by eye. Where your model is less accurate, you can look at the type of inputs at those time steps.


Welcome to Data Science on Stack Exchange,
This is a common question, predicting future prices or forecasting. The gap you see is due to the random nature of prices such as this, along with the underlying complexity of this topic. Unless there is a time pattern in the data, a LSTM model won't predict well. LSTM will especially perform poorly if the data is changing direction often, going up and down in value.
A lot of discussion goes around which model you should use, but not sure any one of them are consistently the best. For some general ideas on different techniques, applied to the stock market in this case, here is a good reference.

https://www.analyticsvidhya.com/blog/2018/10/predicting-stock-price-machine-learningnd-deep-learning-techniques-python/

It mentions LSTM (popular if you believe in momentum), and also Arima, FBProphet, etc.

However, there are usually many other variables that have a large influence on the future price that you won't get this way, such as sentiment, news articles or announcements, etc. You might look at using an ensemble, maybe LSTM + 1 or 2 other models, and combine these different inputs so they can be included in your prediction.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.