How to correct format the dimension of the validation set - time series

I'm trying to understand how to add my validation data into my LSTM. At the moment I'm loading the train and the test set in the following way:

  1. First of all I load my time series from a directory, where they have a 2D shape (#values, #n_features = 30):

    self.train = np.load(os.path.join(data, train, X0train_s30.npy))
    self.test = np.load(os.path.join(data, test, X0test_s30.npy))
    # Shape for LSTM
    self.shape_data(self.train)
    self.shape_data(self.test, train=False)
    
  2. Then I proceeded with shaping it for preparing the input for the LSTM. Since they are time series, I decided a number of steps:

    def shape_data(self, arr, train=True):
       Shape raw input streams for ingestion into LSTM. config.l_s specifies
       the sequence length of prior timesteps fed into the model at
       each timestep t.
    
       Args:
           arr (np array): array of input streams with
               dimensions [timesteps, 1, input dimensions]
           train (bool): If shaping training data, this indicates
               data can be shuffled
       
       data = []
       for i in range(len(arr) - self.config.l_s - self.config.n_predictions):
           data.append(arr[i:i + self.config.l_s + self.config.n_predictions])
       data = np.array(data)
    
    
       assert len(data.shape) == 3
    
       if train:
           np.random.shuffle(data)
           self.X_train = data[:, :-self.config.n_predictions, :]
           self.y_train = data[:, -self.config.n_predictions:, 0]  # telemetry value is at position 0
       else:
           self.X_test = data[:, :-self.config.n_predictions, :]
           self.y_test = data[:, -self.config.n_predictions:, 0]  # telemetry value is at position 0
    

On this way the final train shape is (51324, 250, 30) and the final test shape is (51324, 10). This is the model built, which works perfectly:

cbs = [History(), EarlyStopping(monitor='val_loss',
                                    patience=self.config.patience,
                                    min_delta=self.config.min_delta,
                                    verbose=0)]

self.model = Sequential()

self.model.add(LSTM(
    self.config.layers[0],
    input_shape=(None, channel.X_train.shape[2]),
    return_sequences=True))
self.model.add(Dropout(self.config.dropout))

self.model.add(LSTM(
    self.config.layers[1],
    return_sequences=False))
self.model.add(Dropout(self.config.dropout))

self.model.add(Dense(
    self.config.n_predictions))
self.model.add(Activation('linear'))

self.model.compile(loss=self.config.loss_metric,
                   optimizer=self.config.optimizer)

self.model.fit(channel.X_train,
               channel.y_train,
               batch_size=self.config.lstm_batch_size,
               epochs=self.config.epochs,
               validation_split=self.config.validation_split,
               callbacks=cbs,
               verbose=True)

Now, I decided to add a validation .npy array, which I loaded and shaped on the same format of the train set (is this correct?):

def shape_val(self, arr, train=True):
    data = []
    for i in range(len(arr) - self.config.l_s - self.config.n_predictions):
        data.append(arr[i:i + self.config.l_s + self.config.n_predictions])
    data = np.array(data)
    assert len(data.shape) == 3
    np.random.shuffle(data)
    self.X_val = data[:, :-self.config.n_predictions, :]

Then, I changed the fit function of my model on the following way:

self.model.fit(channel.X_train,
               channel.y_train,
               batch_size=self.config.lstm_batch_size,
               epochs=self.config.epochs,
               validation_data=(channel.X_val, channel.X_val),
               callbacks=cbs,
               verbose=True)

If I try to run it I get the error:

File /opt/anaconda3/envs/telemanom/lib/python3.6/site-packages/keras/engine/training.py, line 1175, in fit batch_size=batch_size)

File /opt/anaconda3/envs/telemanom/lib/python3.6/site-packages/keras/engine/training.py, line 621, in _standardize_user_data exception_prefix='target')

File /opt/anaconda3/envs/telemanom/lib/python3.6/site-packages/keras/engine/training_utils.py, line 135, in standardize_input_data 'with shape ' + str(data_shape))

ValueError: Error when checking target: expected activation_1 to have 2 dimensions, but got array with shape (13769, 250, 30)

Am I formatting the validation set correctly? Or am I missing something important?

Topic reshape lstm keras time-series

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.