distance

Search one 2D distribution for point cluster most similar to another 2D distribution

duhaime

2022年6月4日 09:55

Given a hand drawn constellation (2d distribution of points) and a map of all stars, how would you find the actual star distribution most similar to the drawn distribution? If it's helpful, suppose we can define some maximum allowable threshold of distortion (e.g. a maximum Kolmogorov-Smirnov distance) and we want to find one or more distributions of stars that match the hand-drawn distribution. I keep getting hung up on the fact that the hand-drawn constellation has no notion of scale …

Topic: pattern-recognition distribution distance similarity

Category: Data Science

Date transformation for KNN

Mapp

2022年5月24日 16:06

I have data set with date features like 01/01/2019 and I would like to use KNN. However, I cannot find a good transformation for dates that has a meaningful distance result for the last feature. For example: f1 | 1 | 2 | 3 | 4 | 01/01/2019 f2 | 10 | 3 | 12 | 1 | 14/01/2019 Does anyone have any recommendations?

Topic: k-nn data distance machine-learning

Category: Data Science

matrix profile distance measure characterization

user18602524

2022年5月24日 04:05

If there are various types of distances measures for time series, such as Euclidean, DTW, and shape-based ones, how can we characterize the matrix profile distance measure? Profiling one?

Topic: distance clustering data-mining machine-learning

Category: Data Science

Estimating time to travel between two lat/longs

C Murphy

2022年5月12日 10:05

I'm trying to create an offline estimator for how long it would take to get from one lat/long to another. Two approaches I have come across are the Haversine distance and the Manhattan distance. What I'm thinking of doing is calculating both of them and then using the average between the two as the distance and then use some average speed to calculate time. Since this value will be used as an estimator for drivers in a city a straight …

Topic: manhattan distance

Category: Data Science

How to estimate real distance between two detected objects in an image?

Enzo1912

2022年4月22日 11:05

You may think this is a duplicate, but my situation is different than previously asked questions. The only information I have is the width and height of the bounding boxes of detected people. The dataset I'm working on has images captured in different environments (street, garden, mall, ...). In other words, there is no fixed object in all images I can use as scale. The angle at which each image is captured varies drastically from almost parallel to the ground …

Topic: distance computer-vision

Category: Data Science

Clustering time series based on monotonic similarity

Delforge

2022年4月22日 08:00

Context I am involved in the task of clustering 1500 time series of 500 observations into a few clusters. The time series share all the same observed properties at different spatial locations, but responding to the same exogenous variables. However, for each time series, the magnitude of the response is very different. For a time series of reference $X$, I would like to be grouped in the same cluster series that are alike $X^a$ for all $a > 0$. Tryouts …

Topic: distance preprocessing time-series clustering

Category: Data Science

Given daily sequence of events with only event ID labels (alphanum strings), what algorithms can be used to detect sequences that are outliers?

demoman

2022年4月20日 15:16

For example, the data might be something like this: Sequence 1: ["ABC", "AAA", "ZZ123", "RRZZZ45", "AABBCC"] Sequence 2: ["CBA", "AAA", "YY123", "LMNOP", "AABBCC"] Sequence 3: ["ABC", "AAA", "ZZ123", "RRZZZ45", "AABBCC"] ... Sequence N: ["DEF", "AAA", "ZZ123", "YYZZZ45", "AABBCC"] Sequence 1 and 3 are the same, but sequence 2 and N are different. In the data set, there will be thousands of these sequences every day. Additional questions: How could I calculate similarity (or difference) measure between sequences with sequences of …

Topic: labels distance sequence outlier clustering

Category: Data Science

Siamese vs matching network for correct image category matching

Rambo_john

2022年4月12日 16:07

I have to find the closest match between my image and bunch of already collected images of different classes in the folder. Whic meta-learning approach should I select. I am thinking about the Siamese or matching network. In Siamese, I have to match my image with all existing images in the folder to find the correct match. So do you think if I can use a matching network and produce a better result? What is the parameter based on which …

Topic: meta-learning convolutional-neural-network distance similarity

Category: Data Science

Distance Metric between 2 lists of sets

pettinato

2022年4月11日 18:31

I have 2 list of of sets and I want to calculate a distance. set1 = [ {'A', 'B', 'C'}, {'A', 'D', 'X'}, {'X', 'A'} ] set2 = [ {'A', 'B', 'C', 'D'}, {'A', 'X'}, {'X', 'A', 'B'} ] So if the set of sets are equal I want the distance to be 0, and if unequal then I want the distance to be higher than 0. The exact distance doesn't really matter as I'll ultimately be aggregating to compare …

Topic: jaccard-coefficient distance

Category: Data Science

Cosine-like alternative to Mahalanobis distance

a_gdevr

2022年4月10日 13:41

I would like to have a distance measure that takes into account how spread are vectors in a dataset, to weight the absolute distance from one point to another. The Mahalanobis distance does exactly this, but it is a generalization of Euclidean distance, which is not particularly suitable for high-dimensional spaces (see for instance here). Do you know of any measure that is suitable in high-dimensional spaces while also taking into account the correlation between datapoints? Thank you! :)

Topic: cosine-distance distance

Category: Data Science

Can siamese model trained with euclidean distance as distance metric use cosine similarity during inference?

B200011011

2022年4月4日 18:00

If I have 3 embeddings Anchor, Positive, Negative from a Siamese model trained with Euclidean distance as distance metric for triplet loss. During inference can cosine similarity similarity be used? I have noticed if I calculate Euclidean distance with model from A, P, N results seem somewhat consistent with matching images getting smaller distance and non-matching images getting bigger distance in most cases. In case I use cosine similarity on above embeddings I am unable to differentiate as similarity values …

Topic: siamese-networks cosine-distance distance deep-learning machine-learning

Category: Data Science

Assessing Group Similarities and Dissimilarities Post PCA

hc_ds

2022年4月3日 23:01

The goal is to assess similarity and dissimilarity between 6 known groups. The original data began with the 6 known groups and 2,700+ variables all on a scale of 0 to 100. I have performed PCA to reduce the 2700+ variables into 5 principal components using the dudi.pca function from the ade4 package in R. Here are the Eigenvalues for the components: eigenvalue variance.percent cumulative.variance.percent Dim.1 998.3274 36.635867 36.63587 Dim.2 670.1278 24.591848 61.22771 Dim.3 482.2372 17.696776 78.92449 Dim.4 352.2806 12.927728 …

Topic: pca distance similarity r

Category: Data Science

When would one use Manhattan distance as opposed to Euclidean distance?

Bitcoin Cash - ADA enthusiast

2022年3月29日 07:20

I am trying to look for a good argument on why one would use the Manhattan distance over the Euclidean distance in machine learning. The closest thing I found to a good argument so far is on this MIT lecture. At 36:15 you can see on the slides the following statement: "Typically use Euclidean metric; Manhattan may be appropriate if different dimensions are not comparable." Shortly after, the professor says that, because the number of legs of a reptile varies …

Topic: distance classification machine-learning

Category: Data Science

Does Sliced Wasserstein Distance work in higher than 2 dimensions?

wigeon

2022年3月9日 18:33

I had thought that it only worked for 2D distributions. I am trying to implement a sliced Wasserstein autoencoder and I was wondering if my latent space can be larger than 2D.

Topic: wasserstein autoencoder distance

Category: Data Science

Levenshtein distance vs simple for loop

Jadon Steinmetz

2022年2月22日 21:29

I have recently begun studying different data science principles, and have had a particular interest as of late in fuzzy matching. For preface, I'd like to include smarter fuzzy searching in a proprietary language named "4D" in my workplace, so access to libraries is pretty much non existent. It's also worth noting that client side is single threaded currently, so taking advantage of multi-threaded matrix manipulations is out of the question. I began studying the levenshtein algorithm and got that …

Topic: fuzzy-logic distance efficiency

Category: Data Science

Vectorized String Distance

Anatole

2022年2月22日 10:46

I am looking for a way to calculate the string distance between two Pandas dataframe columns in a vectorized way. I tried distance and textdistance libraries but they require to use df.apply which is incredibly slow. Do you know any way to have a string distance using only column operations ? Thanks

Topic: numpy distance preprocessing pandas

Category: Data Science

How can i use Hellinger Distance on array of different length?

Giacomo fava

2022年2月12日 00:02

I have to use Hellinger distance to compare arrays that are not the same length. How do you do this correctly? Putting a zero in the missing fields for the shorter array does not sound like the best method to me. Some more info on my data: Most array dimensions are (1,58), but a some others are (1,28). Arrays contain numbers from 1 to 3. Example: Array1=[1 1 3 2 3] Array2=[2 3 1 1] One possible solution: newArray2=[2 3 …

Topic: distance k-means clustering data-mining machine-learning

Category: Data Science

Clustering without information about identifier

bazinga

2022年1月27日 03:09

I have a data-set with different products and binary value if it was sold in a store or not. I looks like: product_id store_1 store_2 store_3 store_4 store_5 store_6 0 A 1 0 0 1 0 1 1 B 1 1 0 0 1 0 Is there any way to cluster these products with any information about the products itself? One thought I had was to generate distance between products and then cluster the product X product matrix. Is this …

Topic: distance clustering

Category: Data Science

What's an appropriate clustering quality estimate / metric for precomputed distance in HDBSCAN?

Tarun

2022年1月18日 14:00

HBDSCAN supports estimation of clusters from precomputed distances. However, the python implementation of HDBSCAN (scikit-contrib) doesn't create minimum spanning trees in the absence of raw data when precomputed distance matrices are provided as inputs. Therefore, it doesn't compute the relative_validity score or DBCV score to facilitate hyperparameter tuning in such instances. I am trying to use a Euclidean projection (squareroot transform) of Gower dissimilarity composite (without Podini's option) as a precomputed metric in HDBSCAN. Since distance-based scores like Silhuette are …

Topic: metric distance dbscan python clustering

Category: Data Science

Question about Similarity vs Dissimilarity Matrix

Thomas Formal

2021年12月28日 08:00

Right now, I'm working on coming up with a similarity vs dissimilarity matrix for a set of data points for a clustering algorithm. My question is if I want to use one of the many clustering algorithms given in $R$, such as the K-Medoids algorithm, does it require a similarity or dissimilarity matrix as its parameter? What's the difference between the two? If I use the Gower Distance from the Daisy function in R, does it output a similarity or …

Topic: distance similarity k-means clustering bigdata

Category: Data Science

About