Does K-Means' objective function imply distance metric is Euclidean

Question

Does K-Means' objective function imply distance metric is Euclidean

Logan

2018年2月28日 01:13

The objective/loss function of K-Means algorithm is to minimize the sum of squared distances, written in a math form, it looks like this: $$J(X,Z) = min\ \sum_{z\in Clusters}\sum_{x \in data}||x-z||^2$$

If we have different distance metric, for instance, cosine (I realize there's a conversion between cosine and Euclidean but let's forget it for now), manhattan etc, does it mean we will have a different loss function?

That is, the traditional K-Means based on expectation maximization won't be working right? Because for every iteration we revise centroids, normally, by computing the average. However, for some of the metrics other than Euclidean, average might not be a legit representation for the center.

Topic expectation-maximization k-means clustering machine-learning

Category Data Science

Has QUIT--Anony-Mousse · Accepted Answer · 2018年2月28日 01:13

Yes, the mean is crucial.

If you just plug in another distance function instead of the sum-of-squares, the algorithm may fail to converge (and will not find a local optimum). The k-means algorithm relies on both steps (reassignment and mean recomputation) to optimize the same function. If they optimize different functions, you can get an infinite loop on the worst case.

You can use any Bergman divergence. And in some cases such as Euclidean instead of squared Euclidean nothing really bad will happen (no infinite loop etc.). It just won't find the best solution, because the mean isn't minimizing Euclidean distances.

David Marx · Accepted Answer · 2018年2月27日 08:26

You're right and you're wrong.

The objective/loss function of K-Means algorithm is to minimize the sum of squared distances

Yes, absolutely.

written in a math form, it looks like this: $$J(X,Z) = min\ \sum_{z\in Clusters}\sum_{x \in data}||x-z||^2$$

Err... sort of. This is definitely the most popular formulation of kmeans, but a more appropriate formulation would be:

$$J(X,Z) = min\ \sum_{z\in centroids}\sum_{x \in data} distance(x,z)$$

In your formulation, you're concretely defining distance as euclidean distance, i.e. the L2 norm. But you can swap out L2 for any distance kernel and apply kmeans.

The caveat here is that kmeans is a "hill climbing" algorithm, which means each iteration should always be at least as good as the previous iteration, and so it must be the case that this improvement will be true for both the E and M steps. For most common distance metrics (L1, L2, cosine, hamming...) this is the case and you're good to go, but there are infinite possible distance metrics and if we're going to be technical about it, the probability that a random distance metric will satisfy this criterion is almost surely 0.

So, to circle back to your question: does the objective function as you formulated it imply the distance metric is euclidean? Yes. But does kmeans only apply to euclidean space? No, absolutely not. Use whatever distance metric you want and throw the EM algorithm at it and bam: you've got yourself a non-euclidean kmeans.

Generally, when people say "kmeans", they're talking about euclidean kmeans. But Kmeans is super easy to modify with a different distance metric and I think only the most pedantic people would argue that you shouldn't call it "kmeans" after such a modification. Although it's generally always described with the objective you posted (which, yes, does imply euclidean distance), you can really drop in pretty much any useful distance metric, throw EM at it, and it'll work. Some people might call it "___ kmeans" depending on the distance metric, but really it's all kmeans.

I think part of the reason kmeans often isn't described this way isn't formulated like this is because its often compared with gaussian mixture models (GMM). With a euclidean norm, kmeans is equivalent to a GMM with diagonal covariance (and hard cluster assignment), and is consequently often described as an easy way to fit a GMM with spherical clusters. But this comparison fails if we use a different distance metric.

I suggest you check out the CrossValidated discussion on this topic. Full disclosure: the highest voted answers disagree with me, but as you probably guessed I think they're being fairly pedantic.

Does K-Means' objective function imply distance metric is Euclidean

About