Is my data good for (DBSCAN) clustering?

Question

Is my data good for (DBSCAN) clustering?

M. Fabio

2022年4月28日 06:04

I have a particular dataset consisting of 50k elements with 40 features each. I want to try to cluster the data as it is, without any dimensionality reduction. The main algorithm I am considering is the DBSCAN since is the more versatile and I can accept some poits to result as noise. However how can I judge if the clustering is "significant" since I can't plot the clusters in comparison to the data? Tring to select the paremeters for the DBSCAN I've done a k-nn analysis, but the results worried me. For example, the following is the third nearest neighbor plot. As you can see the distances (y-axis) are pretty much "uniform" along the x-axis (object). Does it mean that the data is somehow uniformly sparse and, in this conditions, clustering is useless?

P.S. I have tried to cluster the data anyway, in particulary using epsilon around 2-2.5 and for different values of min_sample. The silhouette score however results very low, about 0.11, and the noiseless fraction of points is about 80%.

Topic dbscan clustering

Category Data Science

Brian Spiering · Accepted Answer · 2021年3月25日 13:42

You can look at several clustering metrics (e.g., Silhouette Coefficient, Calinski-Harabasz index, or Davies-Bouldin index) and see if they revealing patterns in your clusters.

You can look at relative performance by trying different hyperparameters for DBSCAN and comparing DBSCAN to other clustering techniques.

Is my data good for (DBSCAN) clustering?

About