LSTM Produces Random Predictions
I have trained an LSTM in PyTorch on financial data where a series of 14 values predicts the 15th. I split the data into Train, Test, and Validation sets. I trained the model until the loss stabilized. Everything looked good to me when using the model to predict on the Validation data.
When I was writing up my research to explain to my manager, I happened to get different predicted values each time I ran the model (prediction only) on the same input values. This is not what I expected so I read some literature but was not able to explain my results. Intuitively, my results indicate there is some random variable, node, gate, that is influences the prediction, but I cannot figure out where this is or if/how this can be configured.
Here is my model definition:
class TimeSeriesNNModel(nn.Module):
def __init__(self):
super(TimeSeriesNNModel, self).__init__()
self.lstm1 = nn.LSTM(input_size=14, hidden_size=50, num_layers=1)
self.lstm2 = nn.LSTM(input_size=50, hidden_size=25, num_layers=1)
self.linear = nn.Linear(in_features=25, out_features=1)
self.h_t1 = None
self.c_t1 = None
self.h_t2 = None
self.c_t2 = None
def initialize_model(self):
self.h_t1 = torch.rand(1, 1, 50, dtype=torch.double)
self.c_t1 = torch.rand(1, 1, 50, dtype=torch.double)
self.h_t2 = torch.rand(1, 1, 25, dtype=torch.double)
self.c_t2 = torch.rand(1, 1, 25, dtype=torch.double)
def forward(self, input_data, future=0):
outputs = []
self.initialize_model()
output = None
for i, input_t in enumerate(input_data.chunk(input_data.size(1), dim=1)):
self.h_t1, self.c_t1 = self.lstm1(input_t, (self.h_t1, self.c_t1))
self.h_t2, self.c_t2 = self.lstm2(self.h_t1, (self.h_t2, self.c_t2))
output = self.linear(self.h_t2)
outputs += [output]
outputs = torch.stack(outputs, 1).squeeze(2)
return outputs
If anyone can point out what is wrong with my model or my understanding, I'd be really grateful.
UPDATE 1: Researching the Forget Gate
I suspect that the variability is introduced by the LSTM Forget Gate although I cannot see how.
In Deep Learning, by Goodfellow, Bengio, and Courville, on page 399, the authors present the formula for the forget gate as:
the self-loop weight (or the associated time constant) is controlled by a forget gate unit $f_i^{(t)}$ (for the step $t$ and cell $i$), which sets this weight to a value between 0 and 1 via a sigmoid unit:
$$f_i^{(t)} = \sigma (b_i^f + \sum_j U_{i,j}^fx_j^{(t)} + \sum_j W_{i,j}^fh_j^{(t-1)})$$
where $x^{(t)}$ is the current input vector and $h^{(t)}$ and is the current hidden layer vector, containing the outputs of all the LSTM cells, and $b^f$, $U^f$, $W^f$ are respectively the biases, weights, and recurrent weights for the forget gates.
Again, from Deep Learning by Goodfellow et al. page 397, the authors state,
A crucial addition has been to make the weight of the self-loop conditioned on the context, rather than fixed (Gers et al., 2000). By making the weight of this self-loop gated (controlled by another hidden unit), the time scale of integration can be changed dynamically. In this case, we mean that event for an LSTM with fixed parameters, the time scale of integration can change based on the input sequence, because the time constants are output by the model itself.
I read this as saying the time frame that can impact $y^{(t)}$ is variable based on the input sequence, but the sequence itself is fixed, so I would expect consistent predictions from the forget gate
What I don’t understand is how this would introduce a stochastic effect. It seems that the weights, once trained, are constant, and the input don’t change like any other NN, right?
Topic stacked-lstm pytorch lstm rnn
Category Data Science