coherence

Measuring coherence score for Top2Vec models

Teefs

2022年2月26日 17:03

I am working on creating a number of Top2Vec models on Reddit threads. I am basically changing the HDBScan cluster sizes to get different clusters of the Doc2Vec embeddings representing a different # of topics. I am trying to compare different models using their coherence score. I have tried using Gensim's coherence score but failed. I got an error message indicating that a word in the topics is not included in the dictionary. I also tried using tmtooklit. While I …

Topic: coherence topic-model nlp

Category: Data Science

How to interpret different coherence values

Emil

2021年8月6日 15:07

For an experiment with topic models, I have calculated four coherence values using Gensim's implementation: c_v u_mass c_uci c_npmi From this paper, I know that c_v correlates mostly with human interpretation. For this reason, this seems to be the best score to use for topic evaluation. However, are there arguments for using the other measures? And how can these values be interpreted? They seem to be in a different range.

Topic: coherence metric lda topic-model performance

Category: Data Science

Best measure to indicate quality of LDA model

Emil

2021年3月1日 14:54

On my corpora, I am running LDA with different settings (I experiment with different number of topics, different different ngrams and TFIDF or regular BOW). Now, I want to rank these setups to select one best topic model to continue working with. In order to rank them, I have calculated both the coherence value as well as the perplexity for all the different settings, as is done here. In the link, the number of topics is selected using the coherence …

Topic: coherence perplexity lda topic-model

Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.