how to set threshold value by looking at loss distribution in anomaly detection task

Question

how to set threshold value by looking at loss distribution in anomaly detection task

user12

2022年5月1日 02:02

I am following this tutorial https://towardsdatascience.com/lstm-autoencoder-for-anomaly-detection-e1f4f2ee7ccf to use LSTM autoencoder to detect anomalies in my unsupervised dataset. they plotted loss distribution and i plotted the same loss distribution on my dataset. given in image below

my question is how they are setting the threshold value by looking at the loss distribution. i also want to set threshold by looking at my loss distribution but not clear how can i select threshold. they are saying in tutorial By plotting the loss distribution of the calculated loss in the training set, we can determine a suitable threshold value for identifying an anomaly. In doing this, one can make sure that this threshold is set above the “noise level” so that false positives are not triggered

Topic autoencoder anomaly-detection

Category Data Science

Jon Nordby · Accepted Answer · 2021年12月6日 20:34

Without labeled data, it is not possible to estimate how many False Alarms (false positives) or Missed Detections (false negatives) an anomaly detection system will have.

What one can do is to set a decision threshold based on how many positives (regardless of true or false) one accepts to have. That number needs to be determined based on your business circumstances. For example: You have 1000 potential fraud cases coming in for review per day, 1 hour to review the suspect cases, and spend 0.1 hours on average per case. Then one has a capacity to review 10 cases, or 1% of the total. Then set the anomaly score threshold to the 99% percentile, expecting to accept around 1%.

how to set threshold value by looking at loss distribution in anomaly detection task

About