DBSCAN getting one huge cluster with noisy points
I'm currently trying to cluster customer service email answers (NLP).
When I use DBSCAN with TF-IDF embeddings + Annoy indexes, I get good clusters.
But, when I use DBSCAN with FastText embeddings + Annoy indexes, I get good clusters except the cluster with label zero (0) which seems to include lots of noisy points (that should be labeled with -1 instead of 0).
Anyone with and idea of what this can be? I'm using an eps=0.5 for both cases.
Topic fasttext tfidf dbscan scikit-learn machine-learning
Category Data Science