How to compute threshold?

Question

How to compute threshold?

warriorforce

2022年5月22日 18:58

I would like to detect anomalies for univariate time series data. Most examples on internet show that, after you predict the model, you calculate a threshold for the training data and a MAE test loss and compare them to detect anomalies. So I am thinking is this the correct way of doing it? Shouldn't it be a different threshold value for each data? Also, why do all of the examples only compute MAE loss for anomalies?

Topic keras anomaly-detection regression predictive-modeling

Category Data Science

Ralph Winters · Accepted Answer · 2022年5月22日 18:58

This is just one way of doing it. The example used a training set with a small amount of noise to calculate the maximum of the mean average error between a data point and the prediction as the threshold for detecting a abnormality. The training set is being used to represent the 'normal' time series model. The assumption is that any point beyond a maximum MAE would be 'unreasonable' to be part of the true time series model and that would be considered an anomaly.

But I have also see MSE (mean square error), and MAD (median average deviation) used. If you want to assume normality you can also use MSE * 3 standard deviations. What is important is that you establish some reasonable cutoff by vizualization and include the business in the rule making decision, or you can look at previous research studies for whatever you happen to be studying and see what is typically used, since variances can be different across different domains.

Every time series has an error component. That is what you are essentially trying to measure. It is also possible that the errors can be localized (not averaged across all predictions) and can vary based upon some unknown, or improperly measured feature in the model. That you don't know, and that is why the errors are usually averaged over the model.

The test data set is only used to validate the abnormalities. You can't have 2 thresholds, since you are only using 1 cutoff point.

How to compute threshold?

About