jaccard-coefficient

Cluster Evaluation with Jaccard and Rand Index

Mirko

2022年5月22日 19:00

I've clusterized my data according to 3 criteria in 3 groups. I used kmeans to obtain those cluster so the label for each cluster is random and changes at each script run. To evaluate the consistency of my clusters I decided to use Jaccard index but I can't understand how to apply it properly. Let's say I have this data where alpha beta and gamma are the 3 methods, and the Cluster Index is the value returned by K-means for …

Topic: model-evaluations jaccard-coefficient visualization python clustering

Category: Data Science

Which string distance equation for fuzzy-matching person names is reliable?

Canovice

2022年5月17日 12:29

A reproducible example with a small bit of R code is available in this stackoverflow post (link so I dont need to re-type out the code). The fuzzytext library in R has the following available string methods c("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex"). Our use case is matching (left-joining) basketball player names from 2 different sources. From the stackoverflow post, we have the following concerns to account for when string matching names: The left join shouldn't …

Topic: jaccard-coefficient similarity r

Category: Data Science

Distance Metric between 2 lists of sets

pettinato

2022年4月11日 18:31

I have 2 list of of sets and I want to calculate a distance. set1 = [ {'A', 'B', 'C'}, {'A', 'D', 'X'}, {'X', 'A'} ] set2 = [ {'A', 'B', 'C', 'D'}, {'A', 'X'}, {'X', 'A', 'B'} ] So if the set of sets are equal I want the distance to be 0, and if unequal then I want the distance to be higher than 0. The exact distance doesn't really matter as I'll ultimately be aggregating to compare …

Topic: jaccard-coefficient distance

Category: Data Science

Metrics - multi-class model comparisons

Tavi

2022年3月19日 23:18

I am looking for a way to quantify the performance of multi-class model labelers, and thus compare them. I want to account for the fact that some classes are ‘closer’ than others (for example a car is ‘closer’ to a ‘truck’ than a ‘flower’ is. So, if a labeler classifies a car as a truck that is better than classifying the car as a flower. I am considering using a Jaccard similarity score. Will this do what I want?

Topic: metric machine-learning-model jaccard-coefficient

Category: Data Science

What is the correct formula for Jaccard coefficient with integer vectors?

Veronica

2020年10月17日 22:34

I understand the Jaccard index is the number of elements in common divided by the total number of distinct elements. But it seems to be some discrepancy or terminology confusion about Jaccard being applied to "binary vectors", meaning a vector with binary attributes (0, 1), or, "integer vectors", meaning any vector with integer values (2, 5, 6, 8). There are two formulas depending on the type of elements in the vector? This answer comments about "binary vectors" which "they can …

Topic: metric jaccard-coefficient similarity clustering

Category: Data Science

Efficiently Sending Two Series to a Function For Strings with an application to String Matching (Dice Coefficient)

PythonNoob

2020年8月4日 10:54

I am using a Dice Coefficient based function to calculate the similarity of two strings: def dice_coefficient(a,b): try: if not len(a) or not len(b): return 0.0 except: return 0.0 if a == b: return 1.0 if len(a) == 1 or len(b) == 1: return 0.0 a_bigram_list = [a[i:i+2] for i in range(len(a)-1)] b_bigram_list = [b[i:i+2] for i in range(len(b)-1)] a_bigram_list.sort() b_bigram_list.sort() lena = len(a_bigram_list) lenb = len(b_bigram_list) matches = i = j = 0 while (i < lena and j …

Topic: jaccard-coefficient pandas python parallel efficiency

Category: Data Science

Monotonicity of Jaccard and Dice in multilabel datasets

VSR

2020年3月23日 03:21

I understand that Jaccard and Dice follow a monotonic relation on binary datasets because the two are related as $J = {S \over {(2 - S)}}$, and I guess this would be the case when micro-average is used with multi-label datasets. However, would the two metrics follow a monotonic relation when macro-average is used?

Topic: f1score jaccard-coefficient multilabel-classification

Category: Data Science

What is the state of the art/research metric to compare ellipses but jaccard coefficient?

Tollpatsch

2020年3月18日 12:08

Im looking for the, if there is one, metric to compare ellipses with each other. Last time a had a similar dataset (malaria cells, now its pupiles) i used jaccard coefficient but that was more because of i didnt had the time to do further research on this topic. I jused the jaccard coefficient like that: - transform the multi-d data in 1D to make the comparation even possible. Even tho it worked quite well i didnt like it that …

Topic: metric jaccard-coefficient data

Category: Data Science

Jaccard Similarity with Binary Data

JessicaRabi

2019年8月17日 10:37

I have 5400 rows of data and 3211 columns of attributes. The first 4 columns are ID/Name/ParentID/ObjectType - the rest of the 3207 columns are the attributes that are to be used for similarity measures. Huge dimensionality, I know, but I wanted to (as a first step) just see how this data clusters and finds similarity between all attributes. I converted all attributes values to "0" if there was no value, and "1" if there was a value. I thought …

Topic: jaccard-coefficient visualization python similarity data-cleaning

Category: Data Science

When I would use a specific similarity coefficient over another?

cantyousee

2019年7月11日 20:00

Like using Jaccards over Dice. I want real examples, of when I would prefer to use Jaccards, Dice, Cosine or any other similarity coefficient.

Topic: jaccard-coefficient cosine-distance similarity

Category: Data Science

Jaccard similarity calculate similarity

mitexabel

2019年5月24日 13:38

It is not clear to me how to calculate similarity between two products from the example. How do they calculate that?

Topic: jaccard-coefficient similarity

Category: Data Science

Similarity of search results using Jaccard

HCg

2018年12月11日 15:01

I have a set of search results with ranking position, keyword and URL. I want to make a distance matrix so I can cluster the keywords (or the URLs). One approach would be to take the first n URL rankings for each keyword and use Jaccard similarity. However, I also want higher position ranks to be weighted more highly than lower position ranks - for example two keywords that have the same URL in positions 1 and 2 are more …

Topic: numpy jaccard-coefficient python

Category: Data Science

Jaccard similarity between two items

HonzaB

2016年11月3日 06:05

Calculating similarity between two users is rather straightforward. Consider following example: User A = {7,3,2,4,1} User B = {4,1,9,7,5} Products in common = {1,4,7} Union of products = {1,2,3,4,5,7,9} Hence the Jaccard similarity: 3/7 = 0.429 However it is not clear to me how to calculate similarity between two products. Let's say I want to calculate similarity between products 7 and 1 from previous example, how can one achieve that?

Topic: jaccard-coefficient similarity

Category: Data Science

Implementing Frequently bought together using a DB

QuickNDirty

2016年5月9日 04:18

We have a classic structure of an online shop database (products, customers, sales) and we want to implement a Frequently bought together feature. Our software is in ASP.NET and we do not know PHP to reverse engineer how this is being done in Magento. And all we need is a simple Frequently bought together (not with discounts like Magento offers). I understand that this is machine learning and one of the more common ways is Jaccard coefficient. Is that the …

Topic: jaccard-coefficient databases machine-learning

Category: Data Science

About