Composite Input into Seq2Seq LSTM Network

Given that we have a seq2seq problem, where the input sequence is indeed multiple inputs and not only one as in traditional seq2seq problems. For example, in language translation, we usually give input to LSTM encoder to encoder input and we try on the other side to decode that and compare it with target language. So obviously here the input is basically a matrix of one-hot encoded vectors. Now, is it possible please or do you you know a variation …
Category: Data Science

What is the reason behind Keras choice of default (recurrent) activation functions in LSTM networks

Activation function between LSTM layers In the above link, the answer to the question whether activation function are required for LSTM layers was answered as follows: as an LSTM unit already consists of multiple non-linear activation functions, it is not necessary to use a (recurrent) activation function. My question: Is there a specific reason why Keras by default uses a "tanh" activation and "sigmoid" recurrent_activation if those activations are not necessary? I mean, for a Dense layer the default activation …
Category: Data Science

Activation function between LSTM layers

I'm aware the LSTM cell uses both sigmoid and tanh activation functions internally, however when creating a stacked LSTM architecture does it make sense to pass their outputs through an activation function (e.g. ReLU)? So do we prefer this: model = LSTM(100, activation="relu", return_sequences=True, input_shape(timesteps, n_features)) model = LSTM(50, activation="relu", return_sequences=True)(model) ... over this? model = LSTM(100, return_sequences=True, input_shape(timesteps, n_features)) model = LSTM(50, return_sequences=True)(model) ... From my empirical results when creating an LSTM-autoencoder I've found them to be quite similar.
Category: Data Science

How can I detect anomalies/outliers in my online streaming data on a real-time basis?

Say, I've a huge set of data(infinite in size) consisting of alternating sine wave and step pulses one after the other. What I want from my model is to parse the data sequence wise or point wise and the first time it parses a sine wave and starts facing the step pulses raise an alert as an outlier but as it goes on parsing the data it must recognise the alternating sine and step pulses and treat them as normal …
Category: Data Science

Wiggle in the initial part of an LSTM prediction

I working on using LSTMs and GRUs to make time series predictions. For the most part the predictions are pretty good. However, there seems to be a wiggle (or initial up-then-down) before the prediction settles out similar to the left side of this figure from another question. In my case, it is also causing a slight offset. Does anyone have any idea why this might be the case? Below are the shapes of the training and test sets, as well …
Category: Data Science

LSTM for multiple time series regression with extremely large ranges

I have the following question for those which encountered the same dilemma as me: My target is to develop a LSTM RNN for multi-step prediction for multiple time series representing daily sales of different products. The problem that I face is that the time series have very different ranges, some below 100 units per time observations, other more than 10000. Taking into account that I want to have only one model which should learn all the different time-series, I built …
Category: Data Science

How multi layer LSTM are interconnected?

I am trying to understand the layers in LSTM for my own implementation using Python. I started with Keras to getting familiarized with the layer flow. I have tried the below code in Keras and I have the observations as follows # LSTM MODEL step_size = 3 model = Sequential() model.add(LSTM(32, input_shape=(2, step_size), return_sequences = True)) model.add(LSTM(18)) model.add(Dense(1)) model.add(Activation('linear')) And I got the below summary details for this implementation. I tried to understand the internal structure of these layers and …
Category: Data Science

LSTM Produces Random Predictions

I have trained an LSTM in PyTorch on financial data where a series of 14 values predicts the 15th. I split the data into Train, Test, and Validation sets. I trained the model until the loss stabilized. Everything looked good to me when using the model to predict on the Validation data. When I was writing up my research to explain to my manager, I happened to get different predicted values each time I ran the model (prediction only) on …
Category: Data Science

What are h(t-1) and c(t-1) for the first LSTM cell?

I know in a LSTM chain you should connect the h(t) of the previous cell to the h(t+1) of the next cell, and doing so for c(t). But what about the first cell? What does it get as h(t-1) and c(t-1)? I also like to know, if we want to make a multi layer LSTM, What should we give to the first cell of the second layer as h and c? Also will we throw away the h and c …
Category: Data Science

Splitting and training multiple datasets at the same time

I've got 15 different datasets at about 10GB each. Each dataset comes with a binary 2D ground truth (10486147ish, 1) that I pull from it. I'm trying to figure out how to load each dataset, split them all with scikitlearn's train_test_split, then iterate over all 15 datasets per epoch. Under normal circumstances, the datasets would be shuffled as well, but I cannot figure out how to even do that since the data is too large to load all at once …
Category: Data Science

Stacking LSTM layers

Can someone please tell me the difference between those stacked LSTM layers? First image is given in this question and second image is given in this article. So far what I learned about stacking LSTM layers was based on the second image. When you build layers of LSTM where output of one layer (which is $h^{1}_{l}, l=..., t-1, t, t+1...$) becomes input of others, it is called stacking. In stacked LSTMs, each LSTM layer outputs a sequence of vectors which …
Category: Data Science

Initialising states in a multilayer sequence to sequence model

With a sequence to sequence model where the enocoder and decoder are both comprised of one layer each, the initial state of the decoder is initialised to use the final states of the encoder layer. In the case of a multi-layer sequence to sequence model where there are many layers in the encoder and the decoder, should every layer in the decoder be initialised with the final state of the encoder or just the first layer of the decoder and …
Category: Data Science

Connect a dense layer to a LSTM architecture

I am trying to implement an LSTM structure in plain numpy for didactic reason. I clearly understand how to input the data, but not how to output. Suppose I give as inputs a tensor of dimension (n, b, d) where: • n is the length of the sequence • b is the batch size (timestamps in my case) • d the number of features for each example Each example (row) in the dataset is labelled 0-1. However, when I fed …
Category: Data Science

Why does my LSTM perform better when randomizing training subset vs. standard batch training?

I am training a simple LSTM network using Keras to predict time series values. It is a simple 2-layer LSTM. I get the best performance when I train on subsets of the training set that start at random points. Each subset has a training size of 100 samples and a validation size of 30 samples. At each sample the model has a batch size of 16, trains for 100 epochs with an early stop after 20 epochs of little improvement. …
Category: Data Science

Dropout on which layers of LSTM?

Using a multi-layer LSTM with dropout, is it advisable to put dropout on all hidden layers as well as the output Dense layers? In Hinton's paper (which proposed Dropout) he only put Dropout on the Dense layers, but that was because the hidden inner layers were convolutional. Obviously, I can test for my specific model, but I wondered if there was a consensus on this?
Category: Data Science

How is error back-propagated in a multi-layer RNN

Let's say I have a 2 layer LSTM cell, and I'm using this network to perform regression for input sequences of length 10 along the time axis. From what I understand, when this network is 'unfolded', it will consist of 20 LSTM cells, 10 for each layer. So the 10 cells corresponding to the first layer receive the network input for t = 1 to 10, whereas the 10 cells corresponding to the second layer receive the first layer's output …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.