Anomaly Detection for Large Time Series Data

I am working on detecting anomalies within a large time series data set. It is updated on a regular basis and consists of more than 30 parameters. I am using R as a reference language.

It is a first for me working on this type of projects and I am unfamiliar with most of the techniques. I have 6 weeks to implement a good analytical toolbox to enhance the quality of the control checks on the production line.

I have found a couple of potential methods to analyze it including statistical machine learning, deep learning using auto-encoded neural networks or clustering approaches. The purpose of the chosen method is to detect the anomalies/outliers by itself. It doesn't really need to be real-time analysis. What approach would you recommend to implement for the scope of the project, given the structure of the data?

Topic anomaly data-analysis deep-learning time-series statistics

Category Data Science


Following J.Tukey, you should plot, draw graphs, visualize, etc... until you have a solid pack of examples.

Then make Tukey' fences on each of the 30 parameters. Let $q_1$ and $q_3$ be the 1st and 3rd quartiles, $d=q_3-q_1$ the inter-quartile distance, and define as outlier any observation outside the interval $q_1-k\cdot d < x < q_3+k\cdot d$, where $k$ is a constant. Traditionally, $k=1.5$ indicates an outlier and $k=3$ indicates the data is far out. However, the real value of $k$ should be tested against your examples.

Then make a cluster analysis (with a k-nearest neighbor) and define as outlier any point isolated in one cluster. Again, use your example to test various values of $k$.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.