How to get the probability/closeness of a sample belonging to a specific cluster?
I'm new to this so please let me know if my logic of comparing cosine similarity
and k-means
is incorrect
I got a set of 4 clusters
from k-means
and now I'm interested in the Cluster No. 1
. For this cluster, I take the average of all values for each column
and keep it aside.
Now, I have a test sample, for which I run k-means prediction
and I get output as 1
, meaning it belongs to Cluster No. 1
which is good for me but my use-case here was to calculate that even if that sample didn't belong to Cluster 1
, how close was it to falling in that Cluster No. 1
Hence, to resolve this I thought of doing a cosine similarity
between my test sample and the one where I take average of all values for each column
. Now, in this case, I get a similarity of just 5%
I'm not sure, for my use-case i.e. (Getting the probability/closeness of a sample belonging to a specific cluster)
which is a better interpretation for me?
I know I can use the cluster labels as y
variables and make multi-class classification model
but I want to keep it as un-supervised
as possible. Please guide
Topic unsupervised-learning cosine-distance classification k-means clustering
Category Data Science