How to use Cosine Distance matrix for Clustering algorithms like mean-shift, DBSCAN, and optics?
I am trying to compare different clustering algorithms for my text data. I first calculated the tf-idf matrix and used it for the cosine distance matrix (cosine similarity). Then I used this distance matrix for K-means and Hierarchical clustering (ward and dendrogram). I want to use the distance matrix for mean-shift, DBSCAN, and optics.
Below is the part of the code showing the distance matrix.
from sklearn.feature_extraction.text import TfidfVectorizer
#define vectorizer parameters
tfidf_vectorizer = TfidfVectorizer(max_df=0.8, max_features=200000,
min_df=0.2, stop_words='english',
use_idf=True, tokenizer=tokenize_and_stem, ngram_range=(1,3))
%time tfidf_matrix = tfidf_vectorizer.fit_transform(Strategies) #fit the vectorizer to synopses
terms = tfidf_vectorizer.get_feature_names()
from sklearn.metrics.pairwise import cosine_similarity
dist = 1 - cosine_similarity(tfidf_matrix)
I am new to both python and clustering. I found the code for K-means and hierarchical clustering and tried to understand it but I cannot apply it for other clusterings algorithms. It would be very helpful if I can get some simple explanation of each clustering algorithm and how this distance matrix can be used to implement (if possible) in different clustering.
Thanks in advance!
Topic mean-shift python-3.x dbscan k-means clustering
Category Data Science