What kind of learning is needed for anomaly detection? Supervised learning, semi-supervised learning or unsupervised learning?

I am doing anomaly detection recently, one of the methods is using AEs model to learn the pattern of normal samples.

Determine it as an abnormal sample if it doesn’t match the pattern of normal samples.

I train AE without labels but we need to use ‘label’ to determine which sample is normal or abnormal.

I am wondering what kind of this training is supervised learning,semi-supervised learning or unsupervised learning?

Topic unsupervised-learning supervised-learning anomaly-detection semi-supervised-learning

Category Data Science


Supervised learning is when your model learns from a well-labeled dataset. By well-labeled dataset, I mean every row is categorized into a class. The model learns patterns from this dataset and makes a prediction. One flaw of supervised learning is that it only makes a judgement based on the patterns it has seen in the training dataset.

Unsupervised learning is the opposite of supervised. You have a dataset but you don't know anything about the classes of the dataset. In this case, you try to identify the patterns among the dataset and try to cluster the same patterns together and a cluster forms a class. The number of clusters depends on the data scientist and the data.

Semi-supervised learning algorithms are trained on a combination of labeled and unlabeled data. This is useful for a few reasons. First, the process of labeling massive amounts of data for supervised learning is often prohibitively time-consuming and expensive. Also, if the data is labelled by humans, it can contain bias from our judgment.

Anomaly detection falls under the bucket of unsupervised and semi-supervised because it is impossible to have all the anomalies labeled in your training dataset. There are several methods to achieve this, ranging from statistics to machine learning to deep learning. To have a detailed idea on these things, refer to the following link https://www.datascience.com/blog/python-anomaly-detection


It depends on the algorithm you choose for your data set. Anomaly detection can be done using all three types of algorithms and which one you choose depends on the data set and domain.

Supervised Approach - If your data set has a good distribution of samples for normal data and anomaly data, then you can do a supervised class of algorithms like CNN classifiers, etc.

Semi-Supervised - If your data set has a good distribution of normal data but not anomalous data, then you can use semi-supervised data. In this approach, you train the model on normal data and get an anomaly score of how much actual data deviates from normal data. Example AnoGAN or DCNN based Image Completion, etc.

Unsupervised - If your data set doesn't have a good distribution for both normal and anomalous data then you can consider clustering algorithms and look at those clusters which have few members.


Using AE for anomaly detection is based on the assumption we could learn the non-linear data representation in lower dimensions. As the number of normal points are way larger than that of the outliers, the learning process is dominated by normal data. The learned network favors minimizing the reconstruction error of normal points, which leads to a higher reconstruction error for outliers, which could be thus used as the outlier scores.

To summarize, AE could detect anomalies in an unsupervised manner. You could do this easily with Python Outlier Detection (PyOD) Toolkit. An AE example is here.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.