How to derive Evidence Lower Bound in the paper "Zero-Shot Text-to-Image Generation"?

Can someone share the derivation of Evidence Lower Bound in this paper ?

Zero-Shot Text-to-Image Generation

The overall procedure can be viewed as maximizing the evidence lower bound (ELB) (Kingma Welling, 2013; Rezende et al., 2014) on the joint likelihood of the model distribution over images x, captions y, and the tokens z for the encoded RGB image. We model this distribution using the factorization ${p_\theta,_\psi(x, y, z) = p_\theta(x | y, z)p_\psi(y, z)}$, which yields the lower bound: ${\ln p_\theta,_\psi(x, y) E_{z∼q_\phi(z | x)}\ln p_\theta(x | y, z) − \beta D_{KL}(q_\phi(y, z | x), p_\psi(y, z))}$

Topic openai-gpt autoencoder probability expectation-maximization deep-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.