Cluster Evaluation with Jaccard and Rand Index
I've clusterized my data according to 3 criteria in 3 groups. I used kmeans to obtain those cluster so the label for each cluster is random and changes at each script run.
To evaluate the consistency of my clusters I decided to use Jaccard index but I can't understand how to apply it properly.
Let's say I have this data where alpha beta and gamma are the 3 methods, and the Cluster Index is the value returned by K-means for example.
name | CI_alpha | CI_beta | CI_gamma |
---|---|---|---|
a | 1 | 2 | 2 |
b | 1 | 2 | 3 |
c | 1 | 2 | 3 |
d | 2 | 3 | 3 |
e | 2 | 3 | 1 |
f | 2 | 3 | 2 |
g | 3 | 1 | 1 |
h | 3 | 1 | 3 |
What is noteworthy in this dataset is that the 2 methods alpha and beta for clustering actually returned a perfect match of clusters but Jaccard index between those 2 would return a 0 because all labels are different even though they actually describe the same clusters.
Do you have any idea on how to correctly obtain an informative index?
Also, I'd like to know if it is possible to use Rand index even if I don't actually know the real cluster to obtain an estimate of concordance between clustering methods or it's completely out of its scope.
Topic model-evaluations jaccard-coefficient visualization python clustering
Category Data Science