Time Series Data Noise Handling Questions

There are manufacturing time series data as shown in the picture. The average of the two variables is about 100. However, the noise value is 6500 and 65000, which is too different from other values. I think those are data collection errors or misrecorded noise. I want to process those values and analyze the time series, is there any way? I want to use expert domain knowledge to scope and remove it, but it is difficult to meet an expert.
Category: Data Science

Choosing attributes for k-means clustering

The k-means clustering tries to minimize the within-cluster scatter and maximizing the distances between clusters. It does so on all attributes. I am learning about this method on several datasets. To illustrate, in one the datasets countries are compared based on attributes related to their Human development Index. However some of the attributes are completely unrelated to this dimension, for example total population of countries. How to deal with this attributes? As mentioned before k-means tries to minimize the scatter …
Category: Data Science

Finding the delayed effect of a change in input

I am trying to figure out the delay between changing the speed of a pump that pumps a modifier into a process and the change in Amps drawn by an extruder at the end of the process. The amps drawn are changing constantly and are effected by other variables but the amps are held within a range by changing the speed of the modifier pump. Because the amps drawn are constantly changing you can't just look at the trend line …
Topic: time noise
Category: Data Science

How to create a complex Gaussian random noise with a specific covariance matrix

I am trying to generate a complex Gaussian white noise, with zero mean and the covariance matrix of them is going to be a specific matrix which is assumed to be given. Assume i to be a point on the grid of x axis, where there are N points on the axis. The problem is to generate a complex valued random noise at each point (let's call the random value at the point i as $y_i$), which obeys Gaussian distribution …
Category: Data Science

How to group every data point with HDBSCAN to some group to have no noise?

TASK I am clustering products with about 70 dimensions ex.: price, rating 5/5, product tag(cleaning, toy, food, fruits) I use HDBSCAN to do it GOAL The goal is when users come on our site and I can show similar products to what they viewing. QUESTION How to get all data point to be part of a group, so the goal is to not to have any noise? CODE clusterer = hdbscan.HDBSCAN(min_cluster_size=10,#smallest collection of data points you consider a cluster min_samples=1 …
Category: Data Science

what is correct way to perform normalization on data in Auto encoder?

working on anomaly detection problem. i'm using auto-encoder to denoise given input. I trained network with normal data(anomaly free). so model predict normal state of given input. Normalization of input is essential for my dataset. problem with normalization is that when noise value is very high compare to entire dataset. then prediction follows noise. for example if I add noise (delta=300) to 80% of the data and perform normalization on the dataset which mean value is 250 and standard deviation …
Category: Data Science

Noise Elimination with majority vote filtering

I have a dataset with label noise which I wan't to clean with majority/consensus vote filtering. This will mean I will divide the data in K-Folds and train an ensemble model. Than using the predictions on the data I will remove rows, which are missclassified by most (majority voting) or all (consensus voting). I have a few questions on which I can't find the answers elsewhere: how to decide what models to use in the ensemble the dataset is very …
Category: Data Science

Reducing noisy data from non normal distribution of data with std deviation?

I have used MATLAB code and get the two different row vectors A=1×18 and B=1×350. From both row vectors separately I need to remove the noisy data by using standard deviation. But the problem is that data in both row vectors are NOT normally distributed. Is there any way that I used standard deviation for reducing noise from non normally distributed data. Any guidance will be appreciated. Thanks
Topic: noise
Category: Data Science

KDE Sampling with negative density and/or class-specific weighting

I have a dataset which contains two overlapping distributions/classes of points. I have been trying to sample from just one of these distributions/classes using the scikit learn Kernel Density class, but I am finding this does not work well in overlapping regions. Is there a way to do this sort of KDE sampling that also takes into account/avoids areas where these two distributions overlap? Ideally I would like to sample more often in non-overlapping areas or, when this is not …
Category: Data Science

Separating image signal from constant noise sources

I'm working on image signal from a sensor where the incoming signal consist of high degree of constant noise. The noise patterns are multiple, both with very low frequency and very high frequency but not as high as gaussian or uniform noise. I want to retrieve the original signal from a set of images with as much of this noise removed. I'm thinking about trying to formulate a method which is modified Independent Component Analysis (ICA). Standard noise removal procedures …
Category: Data Science

Multi-task learning for improving segmentation

I am building a multi-task model, where my main task is segmentation and my auxiliary task is either denoising or image inpainting. The goal is to try to improve the quality of the segmentation with the multi-task learning approach, where certain representations are shared across the tasks, thus reducing their noise and increasing the segmentation accuracy. Currently, my approach is as follows: Manually add noise to the image (or a hole if the auxiliary task in inpainting) Fit a model …
Category: Data Science

Deep learning model for very sparse object detection in noisy images

I am trying to build a model that takes in very noisy 200x200 greyscale images of spatially sparse objects and attempts to localise them with bounding boxes. The objects are very thin streaks (data of particle tracks from a particle accelerator) which form triangular patterns, and the background environment is swarmed with gaussian noise so that the patterns are very faint. There is only one such pattern that I need to detect per positive example. I was wondering what good …
Category: Data Science

Using numpy to enter noise into data

I am new to data science and have to generate 200 numbers from a uniform distribution set this as x and generate y data using x and injecting noise from the gaussian distribution y = 12x-4 + noise My Approach: x = numpy.random.rand(200) --> This will generate 200 numbers form a uniform distribution I am not sure hot to inject noise from the guassian distribution probably it's like z = numpy.random.randn(200) and y = 12 * x - 4 + …
Topic: noise gaussian
Category: Data Science

TypeError: __init__() missing 1 required positional argument: 'num_features'

I was trying to denoise image using Deep Image prior. when I use ResNet as an architecture i am getting error. INPUT = 'noise' # 'meshgrid' get_noise function pad = 'reflection' OPT_OVER = 'net' # 'net,input' reg_noise_std = 1./30. # set to 1./20. for sigma=50 LR = 0.01 OPTIMIZER='adam' # 'LBFGS' show_every = 100 exp_weight=0.99 num_iter = 1000 input_depth = 3 figsize = 4 net = get_net(input_depth, 'ResNet', pad, upsample_mode='bilinear').type(dtype) net_input = get_noise(input_depth, INPUT, (img_pil.size[1], img_pil.size[0])).type(dtype).detach() # Compute number of …
Category: Data Science

Model Tree M5 - Robustness to Data Quality Issues

I am currently investigating the M5 tree algorithm by Quinlan(1992) link here: https://sci2s.ugr.es/keel/pdf/algorithm/congreso/1992-Quinlan-AI.pdf An example of a linear regression model of the algorithm can be seen below: An implementation of the model similar to Scikitlearn can be found here: https://github.com/ankonzoid/LearningX/tree/master/advanced_ML/model_tree The M5 model is a more advanced implementation of the standard decision trees such as the IDE3 or C4.5. Instead of simple binary splits of the training features the data is split to the Standard Deviation Reduction calculated as follows …
Category: Data Science

How to remove noise using morphological filtering

I have two groups of dots that both contain noise between them: The line that separates the two groups in the picture is diagonal in shape. I tried to use morphological filtering on this image to remove the noise between these two groups but failed. This is the code that I tried to run on this image: from skimage.morphology import opening, square new_image = opening(image, square(3)) It did remove a little bit of noise, but not enough for them to …
Category: Data Science

Denoising Prior to Image Classification

From what I have read, Denoising during preprocessing for image classification tasks seems to be a bit controversial. While on one hand it might improve classification accuracy, the computational complexity seems to discourage a lot of people. Rather I've seen a lot of claims, but hardly anything I can call concrete evidence, which argue that noise "would not matter" for a sufficiently deep neural network. That being said, assuming that I have decided to perform some kind of Denoising on …
Category: Data Science

Which algorithm to use to identify clusters with a similar value?

Here, an example of my problem: 10000 observations of people with several features [age, gender, region, number of sons, ...] and a value to predict "income". There is not a general relationship between features and income, therefore a normal regression has poor results. Nevertheless, I want to identify specific patterns where this relationship exists. For instance: [young, woman, 2 son] -> high income [young, man] -> small income ... Maybe doing a clustering on the features, and then a regression …
Category: Data Science

Methods for learning with noisy labels

I am looking for a specific deep learning method that can train a neural network model with both clean and noisy labels. More precisely, I would like this method to be able to leverage noisy data as well, for instance by not fully "trusting" noisy data, or weighting samples, or deciding whether to use a specific sample at all for learning. But primarily, I am looking for inspiration. Details: My task is sequence-to-sequence NLP, I have both clean pairs of …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.