How to compare the performance of different number of mixing components for EM algorithm?
I am reading about the EM (Expectation-Maximization) algorithm in a machine learning book. At the end remark of the chapter, the authors mentioned that we cannot decide the "optimality" of the number of components (# of mixtures of Gaussians distributions) based on each model's log likelihood at the end--since models with more parameters will inevitably describe the data better.
Therefore, my questions are
1) How do we compare the performance of each model using a different number of components?
2) What are the important factors that help us to decide that an EM model is sufficient for modeling the observed data?
Topic expectation-maximization clustering data-mining machine-learning
Category Data Science