How to detect anomalies?

Question

How to detect anomalies?

warriorforce

2022年5月15日 11:27

I have timeseries data with one value per day for a year. (there is one column with temperature data). I am using autoencoders to train a reconstruction model with mse loss.

Firstly, I normalized the data using the following code:

training_mean = preprocessed_data.mean()
training_std = preprocessed_data.std()
df_training_value = (preprocessed_data - training_mean) / training_std

After this I make a sequence with data. I am not sure if it's ok to choose 32 time stepts, but otherwise I can't fit the model.

TIME_STEPS = 32
def create_sequences(values, time_steps=TIME_STEPS):
    output = []
    for i in range(len(values) - time_steps + 1):
        output.append(values[i : (i + time_steps)])
    return np.stack(output)

x_train = create_sequences(df_training_value.values)

After this I make a sequential model with first layer:

layers.Input(shape=(x_train.shape[1], x_train.shape[2])),
        layers.Conv1D(
            filters=32, kernel_size=2, padding=same, strides=2, activation=relu
        )
...

After that, I compute the mae loss on training and test data and I compare the test loss with a threshold, as I have seen on many tutorials.

anomalous_data_indices = []
for data_idx in range(0, len(df_test_value)): #(TIME_STEPS - 1, len(df_test_value) - TIME_STEPS + 1):
    if np.all(anomalies[data_idx - TIME_STEPS + 1 : data_idx]):
        anomalous_data_indices.append(data_idx)

The problem is that the test loss is always bigger than the training loss and I get that every value is anomaly. Should I calculate the loss on training and test samples using other method? Thankyou in advance!

Topic autoencoder anomaly-detection neural-network

Category Data Science

How to detect anomalies?

About