Practical problems in anomaly detection where the number of normal data is extremely high compared to abnormal data

Question

Practical problems in anomaly detection where the number of normal data is extremely high compared to abnormal data

pie

2021年5月15日 02:04

If the ratio of abnormal data is about 1 to 10,000 normal data, even if the true negative rate is 99%, there will be 100 false positive data, and the precision( TP/(TP+FP) ) will be low.

If this kind of anomaly detection is to be put to practical use, I think it is necessary to create a model with a fairly high prediction accuracy.

How do the actual examples of anomaly detection in the world deal with this problem? Is it difficult to put anomaly detection to practical use for difficult problems that do not allow for high prediction accuracy, or for problems where the number of anomalies is too small?

Topic anomaly anomaly-detection machine-learning

Category Data Science

fswings · Accepted Answer · 2021年5月15日 02:04

The presumption I feel you might be making is that a Machine Learning model is required for this task, but to be honest statistical approaches and less fashionable algorithms are available.

The reality is training a ML model on your data is going to be a struggle because of the significant class imbalance and low volume of anomalies. That is why pretrained models are attractive, in particular Neural Networks.

To get a feel of the range of options, I find the Python library pyOD a great resource. It has about 30 different ways to detect outliers.

Practical problems in anomaly detection where the number of normal data is extremely high compared to abnormal data

About