How to detect anomalies in each feature - time series

I have a dataset with 5 features corresponding to 5 sensors that measure each three seconds the state of an accelerator. It is structured as well:

Sensor 1 | Sensor 2 | Sensor 3 | Sensor 4 | Sensor 5 | Label
   1.5       1.1        0.8        1.2        1.2       0
   1.2       1.4        1.4        1.4        1.1       0
   1.2       1.1        1.2        1.3        1.5       0

The label indicates if the time series is anomaly(=1) or not(=0). I have an anomaly detection task, and the frameworks I've chosen (1, 2) give me as output an array with length 3 where I have the labels predicted: (0, 1, 0). I usually worked with anomaly detection frameworks which gave me a threshold and I could have easily marked the values above it as anomalies.

In this specific case, with this array of length 3, is it right to assume that I could rewrite the following dataset as this? (True = Anomaly, False = normal)

Sensor 1 | Sensor 2 | Sensor 3 | Sensor 4 | Sensor 5 |
 False       False      False     False       False    
 True        True       True      True        True    
 False       False      False     False       False     

So, instead mark one value at time, it directly mark all the time series as anomaly?

Topic anomaly dataframe anomaly-detection time-series dataset

Category Data Science


I believe the assumption you've made is incorrect (the whole row being anomalous or not).

To explain this thoroughly, you would need to know which algorithm you're using in order to detect whether or not the final label is 0 or 1.

For anomaly detection, you can approach the problem as a Supervised (pretty much classification problem), Unsupervised or Semi-Supervised. Assuming from your data, you've chosen the unsupervised approach.

Unsupervised Anomaly Detection

An unsupervised anomaly detection has a plethora of algorithm subsets; Distance Based, Statistical, Classification, Angle Based, DBscan, Neural Networks and more. Under those subsets lie algorithms that can help detect anomalous values such as unique NN architectures etc.

Some algorithms (most of the ones I've used) have some form of dimensionality reduction such as PCA. Due to that fact, its a lot more difficult to grasp whether a specific column (e.g Sensor 1 is anomalous on its own)

A better way to wrap your head around how/why a datapoint is tagged anomalous or not would be to plot a t-SNE graph.

If you're interested in me editing my answer to create an anomaly detection model that tags data points as anomaly or not (along with the anomaly score for you to be able to set your personal threshold) and plotting a t-SNE graph, let me know.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.