How to correct format the dimension of the validation set - time series
I'm trying to understand how to add my validation data into my LSTM. At the moment I'm loading the train and the test set in the following way:
First of all I load my time series from a directory, where they have a 2D shape (#values, #n_features = 30):
self.train = np.load(os.path.join(data, train, X0train_s30.npy)) self.test = np.load(os.path.join(data, test, X0test_s30.npy)) # Shape for LSTM self.shape_data(self.train) self.shape_data(self.test, train=False)
Then I proceeded with shaping it for preparing the input for the LSTM. Since they are time series, I decided a number of steps:
def shape_data(self, arr, train=True): Shape raw input streams for ingestion into LSTM. config.l_s specifies the sequence length of prior timesteps fed into the model at each timestep t. Args: arr (np array): array of input streams with dimensions [timesteps, 1, input dimensions] train (bool): If shaping training data, this indicates data can be shuffled data = [] for i in range(len(arr) - self.config.l_s - self.config.n_predictions): data.append(arr[i:i + self.config.l_s + self.config.n_predictions]) data = np.array(data) assert len(data.shape) == 3 if train: np.random.shuffle(data) self.X_train = data[:, :-self.config.n_predictions, :] self.y_train = data[:, -self.config.n_predictions:, 0] # telemetry value is at position 0 else: self.X_test = data[:, :-self.config.n_predictions, :] self.y_test = data[:, -self.config.n_predictions:, 0] # telemetry value is at position 0
On this way the final train shape is (51324, 250, 30) and the final test shape is (51324, 10). This is the model built, which works perfectly:
cbs = [History(), EarlyStopping(monitor='val_loss',
patience=self.config.patience,
min_delta=self.config.min_delta,
verbose=0)]
self.model = Sequential()
self.model.add(LSTM(
self.config.layers[0],
input_shape=(None, channel.X_train.shape[2]),
return_sequences=True))
self.model.add(Dropout(self.config.dropout))
self.model.add(LSTM(
self.config.layers[1],
return_sequences=False))
self.model.add(Dropout(self.config.dropout))
self.model.add(Dense(
self.config.n_predictions))
self.model.add(Activation('linear'))
self.model.compile(loss=self.config.loss_metric,
optimizer=self.config.optimizer)
self.model.fit(channel.X_train,
channel.y_train,
batch_size=self.config.lstm_batch_size,
epochs=self.config.epochs,
validation_split=self.config.validation_split,
callbacks=cbs,
verbose=True)
Now, I decided to add a validation .npy array, which I loaded and shaped on the same format of the train set (is this correct?):
def shape_val(self, arr, train=True):
data = []
for i in range(len(arr) - self.config.l_s - self.config.n_predictions):
data.append(arr[i:i + self.config.l_s + self.config.n_predictions])
data = np.array(data)
assert len(data.shape) == 3
np.random.shuffle(data)
self.X_val = data[:, :-self.config.n_predictions, :]
Then, I changed the fit function of my model on the following way:
self.model.fit(channel.X_train,
channel.y_train,
batch_size=self.config.lstm_batch_size,
epochs=self.config.epochs,
validation_data=(channel.X_val, channel.X_val),
callbacks=cbs,
verbose=True)
If I try to run it I get the error:
File /opt/anaconda3/envs/telemanom/lib/python3.6/site-packages/keras/engine/training.py, line 1175, in fit batch_size=batch_size)
File /opt/anaconda3/envs/telemanom/lib/python3.6/site-packages/keras/engine/training.py, line 621, in _standardize_user_data exception_prefix='target')
File /opt/anaconda3/envs/telemanom/lib/python3.6/site-packages/keras/engine/training_utils.py, line 135, in standardize_input_data 'with shape ' + str(data_shape))
ValueError: Error when checking target: expected activation_1 to have 2 dimensions, but got array with shape (13769, 250, 30)
Am I formatting the validation set correctly? Or am I missing something important?
Topic reshape lstm keras time-series
Category Data Science