Cluster Evaluation with Jaccard and Rand Index

Question

Cluster Evaluation with Jaccard and Rand Index

Mirko

2022年5月22日 19:00

I've clusterized my data according to 3 criteria in 3 groups. I used kmeans to obtain those cluster so the label for each cluster is random and changes at each script run.

To evaluate the consistency of my clusters I decided to use Jaccard index but I can't understand how to apply it properly.

Let's say I have this data where alpha beta and gamma are the 3 methods, and the Cluster Index is the value returned by K-means for example.

name	CI_alpha	CI_beta	CI_gamma
a	1	2	2
b	1	2	3
c	1	2	3
d	2	3	3
e	2	3	1
f	2	3	2
g	3	1	1
h	3	1	3

What is noteworthy in this dataset is that the 2 methods alpha and beta for clustering actually returned a perfect match of clusters but Jaccard index between those 2 would return a 0 because all labels are different even though they actually describe the same clusters.

Do you have any idea on how to correctly obtain an informative index?

Also, I'd like to know if it is possible to use Rand index even if I don't actually know the real cluster to obtain an estimate of concordance between clustering methods or it's completely out of its scope.

Topic model-evaluations jaccard-coefficient visualization python clustering

Category Data Science

Oleg · Accepted Answer · 2021年12月19日 01:58

Rand index (also consider the adjusted rand index) measures exactly that, the similarity between two clusterings of the data. In python you can use sklearn for that, have a look at their Clustering performance evaluation for more options.

Rand index counts the agreements over all pairs between two clusterings in the data, so Ci_alpha and Ci_beta would have a result of 1.

Cluster Evaluation with Jaccard and Rand Index

About