anomaly-detection

which statistical parameters are more useful to detect anomalies and outlier? mean max min var?

user10296606

2022年6月2日 15:01

This time series contains some time frame which each of them are 8K (frequencies)*151 (time samples) in 0.5 sec [overall 1.2288 millions samples per half a second) I need to find anomalous based on different rows (frequencies) Report the rows (frequencies) which are anomalous? (an unsupervised learning method) Do you have an idea to which statistical parameter is more useful for it? mean max min median var or any parameters of these 151 sampling? Which parameter I should use? (I …

Topic: anomaly anomaly-detection time-series statistics machine-learning

Category: Data Science

How to fit a model on validation_data?

warriorforce

2022年6月2日 07:01

can you help me understand this better? I need to detect anomalies so I am trying to fit an lstm model using validation_data but the losses does not converge. Do they really need to converge? Does the validation data should resemble train or test data or inbetween? Also, which value should be lower, loss or val_loss ? Thankyou!

Topic: lstm keras anomaly-detection regression

Category: Data Science

Cross-validation for anomaly detection on time series data

kohlstein

2022年6月1日 15:32

I want to perform k-fold cross-validation for the setting where I have a training dataset consisting of a sequential time series that is fully benign and a test dataset (also a sequential time series) which contains labeled anomalies. I already took a look at this post, but as my data is sequential, the answer doesn't work out. I am especially stuck with the factor that for K-fold cross-validation, you use (k-1)/k parts of your data for training and 1/k parts …

Topic: anomaly-detection cross-validation time-series

Category: Data Science

Labels as features in anomaly detection

Daniele

2022年5月31日 01:04

I have a dataset born to solve a classification problem. Due to the imbalances of the Y, i choose to move to an anomaly detection task. Should I use the Y i have inside the anomaly detection model as a features? Is it an overfitting Risk?

Topic: multiclass-classification anomaly-detection class-imbalance

Category: Data Science

Isolation Forest Score Function Theory

Samyak Shah

2022年5月29日 11:07

I am currently reading this paper on isolation forests. In the section about the score function, they mention the following. For context, $h(x)$ is definded as the path length of a data point traversing an iTree, and $n$ is the sample size used to grow the iTree. The difficulty in deriving such a score from $h(x)$ is that while the maximum possible height of iTree grows in the order of $n$, the average height grows in the order of $log(n)$. …

Topic: anomaly-detection decision-trees random-forest

Category: Data Science

An Unsupervised learning method suitable for large categorical data sets

HoonP

2022年5月27日 22:05

I want to detect anomalies in the bank data set in an unsupervised learning method. However, in the bank data set, all columns except time and amount were categorical data, and about half of them had more than 90 percent missing values. This data set tries to detect anomalies through unsupervised learning. I'm currently using Autoencoder to access it, but I wondered if this would work. Also, because the purpose is to detect whether data is abnormal when data comes …

Topic: unsupervised-learning anomaly-detection categorical-data machine-learning

Category: Data Science

Anomaly detection and replacing it with past values in time series

user3002936

2022年5月26日 03:05

I am trying to use anomaly detection to find the anomalies in my time series, and if I find it, I will replace it with my past values. I'm trying to do this because I want to create an upper and lower bound to replace those anomalies and by using the past values will help me to create this bound. Is there any guidance or example, where I can learn to do this? Thanks!

Topic: anomaly-detection time-series python

Category: Data Science

Incremental learning on Autoencoder for anomaly detection

sj2000

2022年5月24日 10:44

I want to incrementally train my pre-trained autoencoder model on data being received every minute. Based on this thread, successive calls to model.fit will incrementally train the model. However, the reconstruction error and overall accuracy of my model seems to be getting worse than what it initially was. The code looks something like this: autoencoder = load_pretrained_model() try: while True: data = collect_new_data() autoencoder = train_model(data) # Invokes autoencoder.fit() time.sleep(60) except KeyboardInterrupt: download_model(autoencoder) sys.exit(0) The mean reconstruction error when my …

Topic: machine-learning-model autoencoder anomaly-detection machine-learning

Category: Data Science

Decision trees for anomaly detection

giogix

2022年5月24日 03:07

Problem From what I understand, a common method in anomaly detection consists in building a predictive model trained on non-anomalous training data, and perform anomaly detection using the error of the model when predicting on the observed data. This method requires the user to identify non-anomalous data beforehand. What if it's not possible to label non-anomalous data to train the model? Is there anything in literature that explain how to overcome this issue? I have an idea, but I was …

Topic: anomaly-detection decision-trees random-forest clustering

Category: Data Science

is it good to have 100% accuracy on validation?

farhanrbn

2022年5月23日 06:09

i'm still new in machine learning. currently i'm creating an anomaly detection for flight data. it is a multivariate time series data that include timestamp, latitude, longitude, velocity and altitude of the aircraft. i'm splitting the data into train and test with 80% ratio. i used the keras LSTM autoencoder to do a anomaly detection. so here's my code def create_sequence(data, time_step = None): Xs = [] for i in range (len(data) - time_step): Xs.append(data[i:(i + time_step)]) return np.array(Xs) # …

Topic: lstm keras anomaly-detection

Category: Data Science

How to compute threshold?

warriorforce

2022年5月22日 18:58

I would like to detect anomalies for univariate time series data. Most examples on internet show that, after you predict the model, you calculate a threshold for the training data and a MAE test loss and compare them to detect anomalies. So I am thinking is this the correct way of doing it? Shouldn't it be a different threshold value for each data? Also, why do all of the examples only compute MAE loss for anomalies?

Topic: keras anomaly-detection regression predictive-modeling

Category: Data Science

Anomaly detection - relation between thresholds and anomalies

Giordano

2022年5月20日 18:06

I'm developing an anomaly detection program in Python. Main idea is to create a new LSTM model every day, training it with the previous 7 days and predict the next day. Then, using thresholds, find anomalies day by day. I've already implemented that and these thresholds are working well: upper threshold is equals to trimmed_mean + (K * interquartile_range) lower threshold is equals to trimmed_mean - (K * interquartile_range) where trimmed_mean and interquartile_range are calculated on prediction error (real curve …

Topic: unsupervised-learning anomaly-detection time-series python machine-learning

Category: Data Science

An autoencoder setup for anomaly detection

Riva11

2022年5月19日 16:36

I am doing anomaly detection using machine learning. i have tried different models like isolation forest, SVM and KNN. The maximum accuracy that I can get from each of them is $80\%$ accordind to my dataset which contains $5$ features and $4000$ data samples, $18\%$ of them are anomalous. When I use autoencoder and I adjust the proper reconstruction loss threshold I can get $92\%$ accuracy but the hidden layers setup of the autoencoder does not seems right despite the …

Topic: autoencoder anomaly-detection dimensionality-reduction

Category: Data Science

Is it impossible to predict defects with data that are not labeled?

hahaha

2022年5月19日 02:52

There is manufacturing data with 10 process variables. Normal and bad labeling are not done. It's tabular fdata. Do you have a paper that only uses data that are not labeled to predict defects or to find variables that affect them? I thought about using the Outlier Detection Algorithm (Isolation Forest, Autoencoder) to predict defects, but I can't find a way because I don't know the exact defect rate. I can't think of a way to verify it, so I'd …

Topic: unsupervised-learning anomaly-detection time-series

Category: Data Science

is there a way to check if i got a "good price" on something?

Mohammad Athar

2022年5月18日 13:08

I'm looking at some data. Actually, the Boston Housing dataset is probably a good proxy for it: https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html I'm wondering if there's a way to predict if I got a "good price" given certain conditions. So something like, if I'm given a tuple such as CRIM, ZN, INDUS = (0.006320,18, 2.31), then is a house price of 50 significantly higher or lower than expected? This isn't quite vanilla anomaly detection, because the combination of a particular CRIM, ZN, INDUS may …

Topic: anomaly-detection

Category: Data Science

Detecting abundance of a certain periodic pattern in a time series?

2022年5月18日 10:02

I am really stumped at the moment about how to solve a particular problem. I have many time series like this: This represents the number of hours a person spends on a website each day throughout the year. Any days where they are not seen to be using the website have zero values, rather than missing values. What I really want to do is to calculate a metric telling me to what extent there is a consistent "1 hour per …

Topic: forecasting anomaly-detection correlation time-series data-mining

Category: Data Science

unsupervised anomaly detection for univariate fast frequency time series data?

user10296606

2022年5月18日 05:07

I have a univariate time series (there is a value for each time sampling) (sampling time: 66.66 micro second, number of samples/sampling time=151) coming from a scala customer This time series contains some time frame which each of them are 8K (frequencies)*151 (time samples) in 0.5 sec [overall 1.2288 millions samples per half a second) I need to find anomalous based on different rows (frequencies) Report the rows (frequencies) which are anomalous? (an unsupervised learning method) Do you have an …

Topic: pipelines unsupervised-learning anomaly-detection scala time-series

Category: Data Science

How to detect anomalies?

warriorforce

2022年5月15日 11:27

I have timeseries data with one value per day for a year. (there is one column with temperature data). I am using autoencoders to train a reconstruction model with mse loss. Firstly, I normalized the data using the following code: training_mean = preprocessed_data.mean() training_std = preprocessed_data.std() df_training_value = (preprocessed_data - training_mean) / training_std After this I make a sequence with data. I am not sure if it's ok to choose 32 time stepts, but otherwise I can't fit the model. …

Topic: autoencoder anomaly-detection neural-network

Category: Data Science

how to set threshold for anomaly detection

user12

2022年5月14日 16:03

I read one research paper and they said that they are using a threshold for anomaly detection. The threshold is determined to make some proportion of data of the validation dataset labeled as anomalies. how does this concept make sense

Topic: anomaly-detection machine-learning

Category: Data Science

Which machine learning technique can be used for predictive log analysis

user3449212

2022年5月13日 13:47

I have log data with 100k records. And These parameters. It looks like this. message types can be helpful for anomaly type detection. Out of total 15 message 5 message considered as anomaly. e.g. invalid user, connection closed by invalid user. Option 1 - Text classification model Create a classification model using text message, where it classifies the record based on message text. But I want to to use predictive analytics using date/time parameters so that it can help for …

Topic: forecasting anomaly-detection deep-learning predictive-modeling machine-learning

Category: Data Science

About