Elbow method for cosine distance
I have clustered vectors by cosine distance using nltk clusterer. If I understand correctly, Y axis for elbow method in euclidian distance would be the sum of every distance (squared) between centroid of the cluster with vectors that belongs to that cluster.
My question is: Would it be the same for clusters using cosine distance?
EDIT: ok, so i tried sum of squares with cosine distance and it seems, that it's returning the same values... heres my code:
EDIT2: My bad,is is working
from nltk.cluster import KMeansClusterer, cosine_distance
import numpy as np
#Load dataset obtained from http://cs.joensuu.fi/sipu/datasets/a1.txt
testing_vectors = np.loadtxt("a1.txt")
for k in range(1,10):
kclusterer = KMeansClusterer(k, distance=cosine_distance)
assigned_clusters = kclusterer.cluster(testing_vectors, assign_clusters=True)
sum_of_squares = 0
current_cluster = 0
for centroid in kclusterer.means():
current_page = 0
for index_of_cluster_of_page in assigned_clusters:
if index_of_cluster_of_page == current_cluster:
y = testing_vectors[current_page]
#sum_of_squares += np.sum((centroid - y) ** 2)
sum_of_squares += (np.dot(centroid,y)**2)/(np.dot(centroid,centroid) * np.dot(y,y))
current_page += 1
current_cluster += 1
print("for k=%s the sum of squares is:%s" %(k,sum_of_squares))
Topic cosine-distance nltk
Category Data Science