How to detect anomalous points

As is clear from the figure, the blue points, which don't follow the trend, are anomalous points.

I'm wondering about the best non-parametric method to detect those points. I have tested some outlier detection methods such as standard deviation, etc. but they don't provide good results while it is clear from the figure.

Topic anomaly anomaly-detection outlier visualization statistics

Category Data Science


Try modeling the time series with a midpoint estimate (expected value at each time) and a band estimate (permissible spread at any given midpoint). Then being an anomaly is just a matter of being outside the band.

How you estimate the midpoints depends on the problem. For example if your time series has one or more seasonalities, you could use time series decomposition (e.g., seasonal-trend decomposition using loess). The band estimate is also problem-dependent.

I did some work like this for a bookings data time series. I wrote about it here: https://techblog.expedia.com/2016/07/28/applying-data-science-to-monitoring/


I would recommend a rolling average, it can be quite robust and is not upset by slow changes over time. You then can use your existing data to determine at which level of deviation you want an alarm. This optimization depends on weither you want rather to catch most of the deviations or minimize the number of false positive alarms.

EDIT1: In the end you will always need to tweak parameters. There is no real ground truth. An anomaly is always a subjective thing.


In your example what differentiates the clusters is not the raw value but rapid departure from previous points. I might look into change-point detection. Nonparametic, but still requires some fiddling with tuning parameters.


A solution can be using DBSCAN algorithm to cluster data. Then, if you set a proper radius for the DBSCAN algorithm, you will get three clusters. Therefore, you can detect some clusters as anomaly that number of their members is less than a threshold.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.