How to group every data point with HDBSCAN to some group to have no noise?
TASK
- I am clustering products with about 70 dimensions ex.: price, rating 5/5, product tag(cleaning, toy, food, fruits)
- I use HDBSCAN to do it
GOAL
- The goal is when users come on our site and I can show similar products to what they viewing.
QUESTION
- How to get all data point to be part of a group, so the goal is to not to have any noise?
CODE
clusterer = hdbscan.HDBSCAN(min_cluster_size=10,#smallest collection of data points you consider a cluster
min_samples=1 #LARGER this value - more points will be declared as NOISE
).fit(data)
color_palette = sns.color_palette('Paired', 2000)
cluster_colors = [color_palette[x] if x = 0
else (0.5, 0.5, 0.5)
for x in clusterer.labels_]
cluster_member_colors = [sns.desaturate(x, p) for x, p in
zip(cluster_colors, clusterer.probabilities_)]
plt.scatter(*projection.T, s=20, linewidth=0, c=cluster_member_colors, alpha=0.25)
labels = clusterer.labels_
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
print('Estimated number of clusters: %d' % n_clusters_)
Topic noise unsupervised-learning dbscan python clustering
Category Data Science