How to compare the performance of different number of mixing components for EM algorithm?

I am reading about the EM (Expectation-Maximization) algorithm in a machine learning book. At the end remark of the chapter, the authors mentioned that we cannot decide the "optimality" of the number of components (# of mixtures of Gaussians distributions) based on each model's log likelihood at the end--since models with more parameters will inevitably describe the data better.

Therefore, my questions are

1) How do we compare the performance of each model using a different number of components?

2) What are the important factors that help us to decide that an EM model is sufficient for modeling the observed data?

Topic expectation-maximization clustering data-mining machine-learning

Category Data Science


1.

The simplest and most common way is to use AIC or BIC. You would pick a model with the minimum AIC/BIC value. AIC/BIC work well here because you have a likelihood function.

Bayesian model selection is another possibility. It's more advances than AIC or BIC, but you get chance to add your own prior distribution. Section 5.3 in

Machine Learning - A Probabilistic perspective (Kevin P.Murphy)

has the details. There're also lots of papers on Google.

Alternatively, you can use the cross-validated likelihood as a performance measure, although this can be slow, since it requires fitting each model N times, where N is the number of CV holds.

Slide 17 in

https://www.doc.ic.ac.uk/~dfg/ProbabilisticInference/IDAPISlides13.pdf

is a good reference.

2.

EM algorithm is a technique for estimating the most likely parameters and is not limited to mixed Gaussian. The algorithm is useful when there's no closed form formula for the solving maximum likelihood problem with one or more latent variables.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.