Why and how Variational Inference underestimates variance?
I referred to the Quora link here as well, but could not understand clearly.
Can anyone please help me understand why and how variational inference underestimates the variance of the true posterior distribution with some theory or mathematical calculations?
[EDIT]: Adding my understanding of the Quora answer based on a visualization.
The red line is p(x). The green line is q(x), the approximating distribution. The blue line is the KL divergence.
When q(x) is less than p(x), the KL divergence values at those points are negative. Those negative values will eventually help in reducing the overall KL divergence. So, variational inference does not care about reducing those. It cares about reducing KL divergence where q(x) is greater than p(x).
Since variational inference does not care about reducing KL divergence values where q(x) is less than p(x), the final q(x) can be of low variance as well. Hence, VI can underestimate variance.
Category Data Science