Measuring coherence score for Top2Vec models
I am working on creating a number of Top2Vec models on Reddit threads. I am basically changing the HDBScan cluster sizes to get different clusters of the Doc2Vec embeddings representing a different # of topics.
I am trying to compare different models using their coherence score. I have tried using Gensim's coherence score but failed. I got an error message indicating that a word in the topics is not included in the dictionary.
I also tried using tmtooklit. While I could get the Document Term Matrix (DTM) easily, I have not been able to get the topic-word distribution using Top2Vec.
Questions:
- Can I resolve either of the issues indicated above (get the dictionary to list all of the terms necessary or producing the topic-word distribution)?
- Are there other metrics that can be used to be compare Top2Vec models?
Topic coherence topic-model nlp
Category Data Science