Why kmeans cluster breakup is like this

Question

Why kmeans cluster breakup is like this

Ayan Mitra

2020年10月1日 07:00

I have a galaxy spectrum data set (total 22000). Similar to an electronic wave data, two dimensional (Flux vs Wavelength). A typical set of wavelength plot looks like below

Now I am doing kmeans on this data set to cluster them based on their spectrum shape/pattern only (using sci-kit learn). Some results of the k means clustering is baffling me, I have made a flow chart of how the candidates clustered as I would go on increasing the number of clusters from k=2 to k=5. The flow chart graph of the division of candidates is below (this is on a smaller subset),

Now my most basic question is why is the division of points in clusters happening like it is seen in the plot, as we traverse down the graph. More specifically, why for example in k=4 case in Cluster=1 it makes a group with such a mix bag ? couldnt kmeans instead select it's constituent 1078 or 580 candidates into a single (more cleaner) cluster ? Also is it a coincidence that there is always an identified 13 members group (in golden arrow) (or a 47).

Topic spectral-clustering supervised-learning k-means clustering

Category Data Science

Why kmeans cluster breakup is like this

About