How to prepare data for LSTM time series prediction

I have a binary classification task for time series data. Every 14 rows in my CSV is relevant to one time slot. How should I prepare this data to be used in LSTM? In other word how to feed the model with this data?

Topic learning python

Category Data Science


Here is the pseudo code for this:

Import pandas as pd
Import numpy as np

Data = pd.read_csv(filename)
Lag = 14
#assuming target column is last one
X=[ ]
Y = [ ]
for x in range(lag, len(data)):
     X.append(data.iloc[x-lag:x,:])
     Y.append(data.iloc[x,-1])
X= np.array(X)
Y = np.aaray(Y)

Although I'm not sure about this statement "Every 14 rows in my CSV is relevant to one time slot.", as it's not cleared to me.

But if I go with your comment "How should I load this data to LSTM?So the number of column is 12 ", what I believe that you are asking how to load multiple features(in your case 12) for a time series model.

If my understanding is correct its a problem of type "Multiple Parallel Timeseries". I have created a similar model in Tensorflow and pushed in github. Github Source Code for Multiple Parallel TimeSeries

Note: Here instead of 12 features, I have used 3 features.


I hope that dataset also consist of meta data, which means you also need to have a one to one mapping of those tuples, eg. dog > good, cat > bad, kittens > bad, puppies > good, etc.

Separate the data into X:training_data, Y:label. Then use a vectorizer and train using X, Y. If you're able to do above steps then use methods like test_train set , cross_folds etc.

Friendly suggestion: Try seq2seq layers before LSTM (they require more resources).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.