What's an appropriate clustering quality estimate / metric for precomputed distance in HDBSCAN?

Question

What's an appropriate clustering quality estimate / metric for precomputed distance in HDBSCAN?

Tarun

2022年1月18日 14:00

HBDSCAN supports estimation of clusters from precomputed distances. However, the python implementation of HDBSCAN (scikit-contrib) doesn't create minimum spanning trees in the absence of raw data when precomputed distance matrices are provided as inputs. Therefore, it doesn't compute the relative_validity score or DBCV score to facilitate hyperparameter tuning in such instances.

I am trying to use a Euclidean projection (squareroot transform) of Gower dissimilarity composite (without Podini's option) as a precomputed metric in HDBSCAN. Since distance-based scores like Silhuette are not appropriate for density-based clustering, is it possible to compute a meaningful score like DBCV to estimate cluster quality with precomputed distances in HDBSCAN?

Topic metric distance dbscan python clustering

Category Data Science

What's an appropriate clustering quality estimate / metric for precomputed distance in HDBSCAN?

About