What's the good index to choose number of clusters so that obtained clusters are homogeneous?

Question

What's the good index to choose number of clusters so that obtained clusters are homogeneous?

jakes

2022年5月8日 07:07

I perform a clustering on one-dimensional dataset and I need a way to automatically decide what's the optimal number of clusters from $k \in \{2, 3, 4, 5, 6\}$. The number of observations to cluster is low (usually around 10-13). I think I'd need to check optimising for one of two goals (or both at the same time) and see what works best:

to achieve partitioning with the lowest within-cluster variances. Intuitively, I would go for something like average within-cluster variance, but I'm actually ok with the situation when some clusters would be formed out of single observation (it's actually desirable for outliers and that's why I check for relatively high number of clusters). And average within-cluster variance would always favour lower number of clusters.
to achieve partitioning with the most similar distances between pairs of observations within a cluster. For example, if I have objects $a, b, c, d$ in my cluster, I'd like to have $d(a, b) \approx d(b, c) \approx d(c, d)$ where $d$ is euclidean distance and $a, b, c, d$ are sorted.

I have studied scikit-learn options and none of them seems appropriate to my case.

Topic unsupervised-learning clustering

Category Data Science

Brian Spiering · Accepted Answer · 2020年11月24日 14:38

Your problem is not appropriate for machine learning. Machine learning will not give robust answer to clustering itself (parameter) or automatic number of clustering (hyperparameter). The number of examples are too few (10-13) and the number of examples to the number of groups (2-5) is also too low.

What's the good index to choose number of clusters so that obtained clusters are homogeneous?

About