Can K-Means cluster label be fixed

Is there any way to fix the K-Means cluster label. I am working with 4 clusters and whenever I run the python program from the beginning the cluster labels change. Is it possible to fix the cluster labels. I am trying to play with the parameter random state, but does not seem to work.

Topic unsupervised-learning machine-learning

Category Data Science


It's working for me with random_state

from sklearn.datasets import load_digits
x, y = load_digits(return_X_y=True)
x = (x - x.mean())/x.std()

def create_cluster(k = 2, random_state=0):
    from sklearn.cluster import KMeans
    kmeans = KMeans(n_clusters=k, random_state=random_state)
    kmeans.fit(x)
    return kmeans

y_pred = create_cluster(k=25).predict(x)
y_pred_2 = create_cluster(k=25, random_state=0).predict(x)
all(y_pred==y_pred_2) # True

y_pred_2 = create_cluster(k=25, random_state=1).predict(x)
all(y_pred==y_pred_2) # False

Different solutions are likely caused by random initialisation since K-means finds a local optimum (once running, it is deterministic). Fix the random seed for all packages used, e.g. numpy, tensorflow (if you're doing this, maybe you're not doing it for all packages?)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.