This time series contains some time frame which each of them are 8K (frequencies)*151 (time samples) in 0.5 sec [overall 1.2288 millions samples per half a second) I need to find anomalous based on different rows (frequencies) Report the rows (frequencies) which are anomalous? (an unsupervised learning method) Do you have an idea to which statistical parameter is more useful for it? mean max min median var or any parameters of these 151 sampling? Which parameter I should use? (I …
Let's say we have a bunch of matrices that we know are non-anomalous. We now receive a new matrix and want to know if it belongs into the group or is way off. Is there a way to do that? I'm thinking of something similar to MAD (median absolute deviation) but for matrices.
How to test unsupervised learning methods for anomaly detection? I am looking for a test strategy to evaluate my result of my anomaly detection technique? what is your offer more than evaluate with different algorithms. My data is some time series very low frequency.
I'm dealing with outlier detection in data streams. I'm looking for a way to summarize my data and obtain important statistics such as means and variance, etc. I want to know if the cluster features or microclusters are suitable or not.
I was wondering, if I have a dataset with categorical and numerical data and labels such as 1 or 0 that shows if a row is anomalous or normal respectively. Is it possible to create somehow a model that will assign something like how much risky a record is using as input these numerical and categorical features? Edit My thoughts were to train a supervised anomaly detection method that will classify the records as 0 or 1. But instead of …
I would like to create an anomaly detection model that assigns a probability of risk instead of labels (1 or 0). My problem is that I only know for sure which records are anomalous but not which are Normal. Regarding this, would be better to work on Unsupervised anomaly detection instead of semi-supervised or supervised? Note: I have high dimensional data (20-40+ features). And a few hundreds of anomalies while around a thousand that I do not know.
Often the hardest part of solving an Anomaly Detection problem can be finding the right technique for the job. Different Anomaly techniques are better suited for different types of data and different problems. Are there any flowchart/tree diagrams that are designed to give users a bit of a rough guide on different anomaly techniques and the approach problems with regard to which technique to try on the data?
I have a dataset with 5 features corresponding to 5 sensors that measure each three seconds the state of an accelerator. It is structured as well: Sensor 1 | Sensor 2 | Sensor 3 | Sensor 4 | Sensor 5 | Label 1.5 1.1 0.8 1.2 1.2 0 1.2 1.4 1.4 1.4 1.1 0 1.2 1.1 1.2 1.3 1.5 0 The label indicates if the time series is anomaly(=1) or not(=0). I have an anomaly detection task, and the frameworks …
I have a task to generate some anomalous points in a real world dataset with 15 features, and a synthetic dataset of 5 features. I was thinking of using correlation between features, but it'll be a problem to do for 15 features. Thanks!
I am using Isolation Forests for Anomaly Detection. Say, my set has 10 variables, var1, var2, ..., var10, and I found an anomaly. Can I rank the 10 variables var1, var2, ..., var10 in such a way I can say that I have an anomaly and the main reason is, say, var6. For example, if I had var1, var2, var3 only, and my set were: 5 25 109 7 26 111 6 23 108 6 26 109 6 978 108 …
Is there a comprehensive open source package (preferably in python or R) that can be used for anomaly detection in time series? There is a one class SVM package in scikit-learn but it is not for the time series data. I’m looking for more sophisticated packages that, for example, use Bayesian networks for anomaly detection.
If the ratio of abnormal data is about 1 to 10,000 normal data, even if the true negative rate is 99%, there will be 100 false positive data, and the precision( TP/(TP+FP) ) will be low. If this kind of anomaly detection is to be put to practical use, I think it is necessary to create a model with a fairly high prediction accuracy. How do the actual examples of anomaly detection in the world deal with this problem? Is …
I have an issue of machine learning/anomaly detection. Indeed, I have a variable Y and several other variables X. The purpose is to quantify the degree of abnormality of the data on Y but I have to take into account the values on the other variables (the relationship between Y and X). Normally, an anomaly detection algorithm would find anomalies but on the whole data (Y + X), but in my case I want to zoom in on Y because …
I have a set of force profiles of an industrial machine. I'm trying to develop an algorithm that tries to understand when a new profile is "anomalous" with respect to the ones in "normal operating conditions". In the picture below you can see the force profiles (function of time). I want the blue curve found checked as anomalous. What approaches do you suggest? I'm thinking about using some statistical distance (like mahalanobis) to check the similarity of a new curve …
I need a sanity check. I want to create an anomaly detection system. The logic which I am planning to use is the following: Find anomalies in the past using Seasonal Hybrid Extreme Studentized Deviate Test. Binarise the anomalies (1 the anomalies and 0 the trends). Run several algorithms (Autoencoders, SVM, Logistic Regression, Naive Bayes, Lasso Regression, etc) with variables that are correlated and validate the models and use it. Does the binarisation process makes sense?
I have panel data based on 900000 different entities with 384 time steps and the data is not normally distributed. I am looking for outliers/anomalies, this is unsupervised as I have no examples of anomalies/outliers. Apart from clustering methods such as K-means, DBSCAN/HDBSCAN, what options do I have?
I have an interesting question, my code needs to be able to handle structured data where I don't know much about the structure at development time. I know the samples follow a schema that can be nested and the leafs contain some basic primitives like floats, integers and categoricals. I will have access to the schema at training time. I want to train a model that can detect anomalies or more directly whether or not a sample comes from the …
I am working on detecting anomalies within a large time series data set. It is updated on a regular basis and consists of more than 30 parameters. I am using R as a reference language. It is a first for me working on this type of projects and I am unfamiliar with most of the techniques. I have 6 weeks to implement a good analytical toolbox to enhance the quality of the control checks on the production line. I have …
As is clear from the figure, the blue points, which don't follow the trend, are anomalous points. I'm wondering about the best non-parametric method to detect those points. I have tested some outlier detection methods such as standard deviation, etc. but they don't provide good results while it is clear from the figure.