K-means++ with cosine distance

I am wondering how to implement k-means++ with cosine distance, acording to quote below (wikipedia), which says, that distance needs to be squared. But with square is lost direction of distance which in my understanding really matters.

cos_dist(x,y) = -1 = (-1)^2 = 1
  1. Choose one center uniformly at random among the data points.
  2. For each data point x not chosen yet, compute D(x), the distance between x and the nearest center that has already been chosen.
  3. Choose one new data point at random as a new center, using a weighted probability distribution where a point x is chosen with probability proportional to D(x)2.
  4. Repeat Steps 2 and 3 until k centers have been chosen.
  5. Now that the initial centers have been chosen, proceed using standard k-means clustering.

Topic cosine-distance k-means clustering machine-learning

Category Data Science


The intuition is to choose a point that is as far as possible from the existing centers. It does not matter in which direction the new point lies, as long as it is far away.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.