Can K-Means cluster label be fixed

Question

Can K-Means cluster label be fixed

GabS

2022年5月25日 17:03

Is there any way to fix the K-Means cluster label. I am working with 4 clusters and whenever I run the python program from the beginning the cluster labels change. Is it possible to fix the cluster labels. I am trying to play with the parameter random state, but does not seem to work.

Topic unsupervised-learning machine-learning

Category Data Science

10xAI · Accepted Answer · 2021年6月2日 17:15

It's working for me with random_state

from sklearn.datasets import load_digits
x, y = load_digits(return_X_y=True)
x = (x - x.mean())/x.std()

def create_cluster(k = 2, random_state=0):
    from sklearn.cluster import KMeans
    kmeans = KMeans(n_clusters=k, random_state=random_state)
    kmeans.fit(x)
    return kmeans

y_pred = create_cluster(k=25).predict(x)
y_pred_2 = create_cluster(k=25, random_state=0).predict(x)
all(y_pred==y_pred_2) # True

y_pred_2 = create_cluster(k=25, random_state=1).predict(x)
all(y_pred==y_pred_2) # False

Carl · Accepted Answer · 2021年6月2日 15:37

Different solutions are likely caused by random initialisation since K-means finds a local optimum (once running, it is deterministic). Fix the random seed for all packages used, e.g. numpy, tensorflow (if you're doing this, maybe you're not doing it for all packages?)

Can K-Means cluster label be fixed

About