K-means++ with cosine distance

Question

K-means++ with cosine distance

Night bird

2021年9月18日 12:10

I am wondering how to implement k-means++ with cosine distance, acording to quote below (wikipedia), which says, that distance needs to be squared. But with square is lost direction of distance which in my understanding really matters.

cos_dist(x,y) = -1 = (-1)^2 = 1

Choose one center uniformly at random among the data points.

For each data point x not chosen yet, compute D(x), the distance between x and the nearest center that has already been chosen.

Choose one new data point at random as a new center, using a weighted probability distribution where a point x is chosen with probability proportional to D(x)2.

Repeat Steps 2 and 3 until k centers have been chosen.

Now that the initial centers have been chosen, proceed using standard k-means clustering.

Topic cosine-distance k-means clustering machine-learning

Category Data Science

gtholpadi · Accepted Answer · 2021年9月18日 12:10

1

gtholpadi answered at 2021年9月18日 12:10

The intuition is to choose a point that is as far as possible from the existing centers. It does not matter in which direction the new point lies, as long as it is far away.

K-means++ with cosine distance

About