How do you work with Latent Dirichlet Allocation in practice
One need to provide LDA with a predefined number of latent topics. Let say I have a text corpus in which I hypothesize there are 10 major topics, all composed of 10 minor subtopics. My objective is to be able to define proximity between documents.
1) How do you estimate the number of topics in practice ? Empirically ? With another method like Hierarchical Dirichlet Process (HDP) ?
2) Do you build several models ? For major and minor topics ? Is there a way to capture the hierarchical structure of the topics ?
Topic dirichlet
Category Data Science