How to detect anomalies?
I have timeseries data with one value per day for a year. (there is one column with temperature data). I am using autoencoders to train a reconstruction model with mse loss.
Firstly, I normalized the data using the following code:
training_mean = preprocessed_data.mean()
training_std = preprocessed_data.std()
df_training_value = (preprocessed_data - training_mean) / training_std
After this I make a sequence with data. I am not sure if it's ok to choose 32 time stepts, but otherwise I can't fit the model.
TIME_STEPS = 32
def create_sequences(values, time_steps=TIME_STEPS):
output = []
for i in range(len(values) - time_steps + 1):
output.append(values[i : (i + time_steps)])
return np.stack(output)
x_train = create_sequences(df_training_value.values)
After this I make a sequential model with first layer:
layers.Input(shape=(x_train.shape[1], x_train.shape[2])),
layers.Conv1D(
filters=32, kernel_size=2, padding=same, strides=2, activation=relu
)
...
After that, I compute the mae loss on training and test data and I compare the test loss with a threshold, as I have seen on many tutorials.
anomalous_data_indices = []
for data_idx in range(0, len(df_test_value)): #(TIME_STEPS - 1, len(df_test_value) - TIME_STEPS + 1):
if np.all(anomalies[data_idx - TIME_STEPS + 1 : data_idx]):
anomalous_data_indices.append(data_idx)
The problem is that the test loss is always bigger than the training loss and I get that every value is anomaly. Should I calculate the loss on training and test samples using other method? Thankyou in advance!
Topic autoencoder anomaly-detection neural-network
Category Data Science