What I want to understand in simple terms is that: Suppose I have a similarity matrix of all training examples specifying amount of similary between any(all) two examples in training sample. How Can I make a classifier or cluster based only on this information?
Let me try to explain this in simple terms, There is something called Cosine similarity which is calculated between two vectors. It is defined as the angle (theta) between two vectors.
Case 1: Let's say we have two vectors a and b and let's assume the two vectors are perpendicular to each other and so the angle between them is 90 degree, So cos-sim(a,b) = cosine(90 degrees)
which is 0. This means the similarity between vectors a and b is 0. The two vectors are highly dissimilar
Another related concept is Cosine distance which is used to define how far two vectors lie from its similarity value i.e what is the distance between them. There is a beautiful mathematical proof proving cos-dist(A,B)=1−cos-sim(A,B)
.
For our case, cos-dist(A,B) = 1−cos-sim(A,B) = 1−0
which equals 1. This means the distance between the two vectors is at its maximum
Case 2: When theta is 0 cos-sim(a,b) = cosine(0 degrees)
which is 1. This means the similarity between vectors a and b is 1. The two vectors are highly similar and now the cos-dist(A,B) = 1−cos-sim(A,B) = 1−1
which equals 0. This means the distance between the two vectors is at its minimum.
Now consider each of your data point as some n-dimensional vector and someone gives you a matrix with cosine similarity values of every point with respect to every point in your data. Given that you have similarity values, you can calculate the distance between any point to any other point thereby forming clusters of points which are similar. If you have more than 2 clusters, you can work out this problem as a multiclass classification problem. Hope this clears your doubt