What's the good index to choose number of clusters so that obtained clusters are homogeneous?
I perform a clustering on one-dimensional dataset and I need a way to automatically decide what's the optimal number of clusters from $k \in \{2, 3, 4, 5, 6\}$. The number of observations to cluster is low (usually around 10-13). I think I'd need to check optimising for one of two goals (or both at the same time) and see what works best:
to achieve partitioning with the lowest within-cluster variances. Intuitively, I would go for something like average within-cluster variance, but I'm actually ok with the situation when some clusters would be formed out of single observation (it's actually desirable for outliers and that's why I check for relatively high number of clusters). And average within-cluster variance would always favour lower number of clusters.
to achieve partitioning with the most similar distances between pairs of observations within a cluster. For example, if I have objects $a, b, c, d$ in my cluster, I'd like to have $d(a, b) \approx d(b, c) \approx d(c, d)$ where $d$ is euclidean distance and $a, b, c, d$ are sorted.
I have studied scikit-learn options and none of them seems appropriate to my case.
Topic unsupervised-learning clustering
Category Data Science