Conditional Entropy and Mutual Information - Clustering evaluation

First of all, I am doing clustering and I have the true labels for my data. For evaluation, I am using the weighted average of the entropy values for each predicted cluster. I also came across with Mutual Information as a similar approach while going over the alternatives. On my data, they seem to give similar results.

However there is one issue that puzzles me.

Given the predicted cluster set $U$ and true clusters $V$, mutual information was defined as: $$ I(U,V) = H(U) - H(U|V) $$ or, $$ I(U,V) = H(V) - H(V|U) $$ If my math is correct, the average entropy that I'm using corresponds to conditional entropy term $H(V|U)$ and trying to minimize this aligns with maximizing the mutual information.

What I cannot see is how weigthed average entropy would differ from mutual information and why we would need the entropy terms $H(U)$ or $H(V)$. It feels like minimising one of the conditional entropies should suffice.

To put it another way, as far as I understood from the equations, having high entropy for true or predicted clusters in itself also results in higher mutual information. Does this mean that mutual information favors equally-sized clusters?

Thanks in advance.

Topic mutual-information information-theory evaluation clustering

Category Data Science


Mutual information does favor many small clusters. Nectar these tend to be "pure". That is why variations wish as normalized mutual information and adjusted mutual information (AMI) are used instead.

Vinh, N. X.; Epps, J.; Bailey, J. (2009). "Information theoretic measures for clusterings comparison". Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09. p. 1. doi:10.1145/1553374.1553511.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.