What are the benefits of using spectral k-means over simple k-means?

I have understood why k-means can get stuck in local minima. Now, I am curious to know how the spectral k-means helps to avoid this local minima problem.

According to this paper A tutorial on Spectral, The spectral algorithm goes in the following way

  1. Project data into $R^n$ matrix

  2. Define an Affinity matrix A , using a Gaussian Kernel K or an

    Adjacency matrix

  3. Construct the Graph Laplacian from A (i.e. decide on a normalization)

  4. Solve the Eigenvalue problem

  5. Select k eigenvectors corresponding to the k lowest (or highest) eigenvalues to define a k-dimensional subspace

  6. Form clusters in this subspace using k-means

In step 6, it is using k-means. How is it overcoming the local minima problem of k-means? Moreover, what are the benefits of spectral over k-means?

If someone gives detailed insight upon this, it will be very helpful for me.

Topic spectral-clustering kernel k-means machine-learning

Category Data Science


They are totally different approaches. Spectral Embedding is a representation of your data and it maps close data points next to each other in a new feature space. This helps k-means to deal with more separable clusters rather than original space. But this is not my answer to your question! The answer is that you need to know not only the algorithm of Spectral Clustering but how it maps data into a new vector space. If you read about that you will understand it easily. I give it a try here:

  1. Spectral Embedding (on which you apply a simple clustering and get Spectral Clustering!) is basically a graph embedding method. What graph means is out of scope of this answer and I assume you know it. In graphs, a good clustering puts those nodes who have many "within connections" (with each other) and a few "between connections" (with other parts of graph) into a cluster. See bellow image to understand more.

    enter image description here

    Red nodes have several connections to each other but one single connection relates them to other parts of graph. It is a good cluster, right?

    An application example could be suggesting friends in Facebook. Facebook is a graph in which each node is a person and each connection implies friendship on FB. You have many connections with your circle of friends and others have the same "cluster of friends" too. If we cluster people on FB graph, we can check which people in same cluster are not friends and we suggest them to each other, right?!

    Based on brilliant work of Miroslav Fiedler called "Algebraic Connectivity of Graphs", this graph clustering can happen easily by finding those edges that removing them, clusters data in the way I mentioned above.

    Spectral Clustering is mainly doing it. Now the question is "How can I get benefit of such graph representation in a clustering problem which is hard for k-means?"

  2. Look at image bellow. It is a famous example of non-linearly separable data that k-means easily fails to cluster (because red points are between any two blue points which are on different sides of blue circle. K-means gets confused here!)

    enter image description here

    Now we convert our data into a graph using the rule each data point is a node and every node is connected to its k-nearest neighbours. How will that graph look like?! It will densely connect the nodes related to blue points to each other, and nodes related to red points to each other and probably very little connection between any blue and red node as they are barely in neighbourhood of each other. What we did was "converting non-linearly separable but connected circles in original space into connected bunches of nodes in graph representation". Now we just need to turn the graph representation into numerical vectors and Taddaaa! we will have those strange clsusters as two beautifully separable bunches of points here k-mean does not get confused anymore!

DISCLAIMER: There are many simplifications and not accurately use of terms in my answer. I tried to be intuitive more than accurate (e.g. our example data will result in two disconnected subgraphs which need numerical solution to be clustered using spectral clustering but does not affect the concept anyways)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.