Why does changing the cluster number change the plot in Kmeans?
This might be a dumb questions but I can't find the answer to it. I don't have the perfect mathematical understanding of kmeans, so apologies if it is.
I'm just wondering why I see a different plot when I change the number of clusters in a kmeans plot?
Here's the code that I'm using:
set.seed(1)
k - kmeans(data, centers = x)
plotcluster(data, k$cluster)
I vary x
to see how the plot looks like. Below are the results for x = 3 and x = 4. Apologies for the poor formatting.
I'm wondering why both plots look different if I'm only varrying the number of clusters. Is it because the principle components that are being show, dc1
and dc2
, are different as you change x
, so as to maximize the variance displayed?
Another quick question is - can you determine the cluster # by how "neat" the clustering looks in the plot? I know there are various methods of doing it, I'm just wondering if the plot is indicative of good/bad clustering as well.
any help is appreciated!
Topic plotting r k-means clustering
Category Data Science