Search one 2D distribution for point cluster most similar to another 2D distribution

Given a hand drawn constellation (2d distribution of points) and a map of all stars, how would you find the actual star distribution most similar to the drawn distribution? If it's helpful, suppose we can define some maximum allowable threshold of distortion (e.g. a maximum Kolmogorov-Smirnov distance) and we want to find one or more distributions of stars that match the hand-drawn distribution. I keep getting hung up on the fact that the hand-drawn constellation has no notion of scale …
Category: Data Science

An error in Figure 13.13 of Bishop's PRML?

This figure comes from Chapter 13 Sequential Data in Bishop's Pattern Recognition and Machine Learning (PRML). I am not sure whether there is an error that it seems like the $$ p(\mathbf{x}_n|z_{n+1,k}), k=1,2,3 $$ in Figure 13.13 should be $$ p(\mathbf{x}_{n+1}|z_{n+1,k}), k=1,2,3 \,. $$ I am wondering can anyone who has read this book give me some advice?
Category: Data Science

Identify the existence of a wall like structure in a given int array

The idea is that I will query an API endpoint which will return me an array consisting of a price value and a quantity value [price, quantity]. In this dataset there is high possibility that there are structure of values where there is a sudden increase in quantity for a given price range compared to the rest of its surroundings, basically a wall like structure. Example below shows a range of price values and a quantity value as a heatmap. …
Category: Data Science

Identifying patterns in tabular data

I have a set of tables containing some thousand entries and some tenths of columns from machine status values of production. The entries are of mixed types like string, float, or timestamp. Each table is pre-labeled with a certain failure mode (e.g. valve setting jump, the problem with inlet A, etc.). This could be due to a jump in the mean values in some columns or a special correlation between several columns. This is what I refer to as a …
Category: Data Science

Approach for finding patterns in daily event data

I am software engineer but have zero data science background, so apologies for this basic question. I would like to find correlations between different behaviors of my daily life and different outcomes. Some examples would be: How does "amount of screen time" correlate with "time to fall asleep"? How does "amount of exercise" correlate with "amount of sleep" How does "amount of alcohol" correlate with "amount of REM sleep" How does "amount of sunlight exposure" correlate with "daily energy" [these …
Category: Data Science

How to recognize this type of pattern in my data?

I have this question that is just bugging my mind and I can't find an actual solution to it online. I have a certain pattern I would like to detect in my data, like the example I have in my picture, the surrounded by the yellow rectangle, I would like to know an approach I could use to find these, all I need is direction to what I have to look into or read to solve my problem, I would …
Category: Data Science

How can I correctly identify an item within a larger image, but also detect if the item is the correct orientation?

I have some machinery at work with a small sticker I am trying to detect within a larger image. I am familiar with object detection techniques based on a trained classification model, but to further complicate the matter, the following scenarios are possible and need to be identified: The sticker exists (correct orientation) The sticker exists but is inside-out The sticker exists but is upside-down The sticker exists but is inside-out AND upside-down The sticker does not exist in the …
Category: Data Science

Graph Pattern Matching Library

Assume that in an application, the user gives us a graph and we want to consider it as a pattern and find all occurrences of the pattern in a graph database (like neo4j). If we knew what the pattern is, we could write the pattern as a static query and run it against our database. However, now we do not know what the pattern is beforehand and receive it from the user in the form of a graph. How can …
Category: Data Science

What algorithms exist for identify repeating patterns in a single image?

I am looking for algorithms or models for detecting and identifying repeated patterns in a single image. For example, an arbitrary smaller image might be pasted at random locations in the image. In the situation at hand, no prior information is known about the appearance of the object or pattern. Do any algorithms/models for this exist?
Category: Data Science

Extracting linear trends from a dataset

Consider a sensor measurement f that varies with both temperature T and the properties of the fluid being measured. The temperature changes through each day and the fluid properties can be assumed to vary less frequently. If I cross plot the data in Excel then by eye I can very easily draw a straight line through some points and translate that line horizontally and "voila" that same line fits through other clusters of plots. So if that line has slope …
Category: Data Science

Subsequence parttern matching for time series

I have a set of time series data (just like voice sequence data) with the pattern as shown in the first figure (theoretical data). The measured data is given as presented in the second figure. What I want to do is localizing/finding the subsequent pattern as shown in the red squares. Is there any algorithm to solve this problem? It looks like the classification/regression problem in machine learning, but I have no idea how to start it.
Category: Data Science

Find repeating patterns in data

I have database of sequential events for multiple animals. The events are represented by integers so it looks something like: Animal A: [1,6,4,2,5,7,8] Animal B: [1,6,5,4,1,6,7] Animal C: [5,4,2,1,6,4,3] I can manually see that for each event 6 event 1 first happens. And event 4 happens quickly after a 1,6 combination. But these are easy to spot in such a small dataset, the real lists are 10000+ events per animal. Is there a way to use an algorithm or machine …
Category: Data Science

Correlation/Pattern Recognition in Lists

I am looking for algorithms to find pattern or more precise correlations in lists compared to an Output. Let us assume I have a Database like this: Input: [A,C,D,E...], Output: Positive Input: [A,B,C,E,F...], Output: Negative The Problem is that the distinct Input values are roughly 1000 and not just 6 like in my example (A-F). The output is binary though. Do you know of any algorithm that detects correlations in the Inputs to finally detect the most critical Inputs that …
Category: Data Science

I have a data set of optimal values after simulations, How can I find if this dataset follows a specific pattern or any relation exists?

In the simulation I am conducting, I have a set of triangles and I select the optimal triangle based on my metric. After every simulation, I obtain an optimal triangle and I note down the lengths of its three sides. So I conducted this simulation a large number of times and noted the lengths of the three sides of the optimal triangle. Now I want to see if there is any pattern which these three sides follow or if there …
Category: Data Science

Detecting synchronization cascades in time series

I am researching delayed synchronisation in a system of coupled oscillators. There is a one-way causal connection between the oscillators, which leads to the synchronisation events occurring in a rough sequence. When I plot this data a very clear pattern emerges: I am looking for a way of identifying and fitting a line to the highlighted areas of the graph. Do you have any ideas on how to find these patterns?
Category: Data Science

Avoiding overfitting in unsupervised ML

I am using a unsupervised pattern matching approach to create a trade strategy. I use the output of the pattern matched results to decide whether to enter a trade or not. For deciding the best pattern parameters I run several combinations over the entire data set and choose the best parameters that yield the best profits. My question is whether this would be considered overfitting. If so, how may I avoid the same? I looked at several posts on StackOverflow …
Category: Data Science

Help with DDP Mining algorithm for Effective Classification of data sets from 2 groups

I'm trying to implement the DDPmine algorithm from this article as part of some project, and I do not understand where in the algorithm we use the Class Label of each transaction? We have transactions from 2 different groups spouse group has a class label "0" and group b has the class label "1" and we want to find the Discriminative Patterns that are frequent in each group but not on the 2 groups combined but in which part of …
Category: Data Science

Finding a pattern in reponses in R programming

I did a study in which around 1000 participants took a test (100 questions). In this study participants were asked in each question to choose between two texts (text 1 and text 2) and decide which text is easier for them. Now in R I want to check if there are any participants who followed a pattern. For example, he or she have only chosen texts 1 or text 2. I also want to examine response string screening for participants …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.