With the help of better explanations provided in Z-Forcing: Training Stochastic Recurrent Networks:
When posterior is not collapsed, $z_d$ (d-th dimension of latent variable $z$) is sampled from $q_{\phi}(z_d|x)=\mathcal{N}(\mu_d, \sigma^2_d)$, where $\mu_d$ and $\sigma_d$ are stable functions of input $x$. In other words, encoder distills useful information from $x$ into $\mu_d$ and $\sigma_d$.
We say a posterior is collapsing, when signal from input $x$ to posterior parameters is either too weak or too noisy, and as a result, decoder starts ignoring $z$ samples drawn from the posterior $q_{\phi}(z|x)$.
The too noisy signal means $\mu_d$ and $\sigma_d$ are unstable and thus sampled $z$'s are also unstable, which forces the decoder to ignore them. By "ignore" I mean: output of decoder $\hat{x}$ becomes almost independent of $z$, which in practice translates to producing some generic outputs $\hat{x}$ that are crude representatives of all seen $x$'s.
The too weak signal translates to
$$q_{\phi}(z|x)\simeq q_{\phi}(z)=\mathcal{N}(a,b)$$
which means $\mu$ and $\sigma$ of posterior become almost disconnected from input $x$. In other words, $\mu$ and $\sigma$ collapse to constant values $a$, and $b$ channeling a weak (constant) signal from different inputs to decoder. As a result, decoder tries to reconstruct $x$ by ignoring useless $z$'s which are sampled from $\mathcal{N}(a,b)$.
Here are some explanations from Z-Forcing: Training Stochastic Recurrent Networks:
In these cases, the posterior approximation tends to provide a too
weak or noisy signal, due to the variance induced by the stochastic
gradient approximation. As a result, the decoder may learn to ignore z
and instead to rely solely on the autoregressive properties of x,
causing x and z to be independent, i.e. the KL term in Eq. 2 vanishes.
and
In various domains, such as text and images, it has been empirically
observed that it is difficult to make use of latent variables when
coupled with a strong autoregressive decoder.
where the simplest form of KL term, for the sake of clarity, is
$$D_{KL}(q_{\phi}(z|x) \parallel p(z|x)) = D_{KL}(q_{\phi}(z|x) \parallel \mathcal{N}(0,1))$$
The paper uses a more complicated Gaussian prior for $p(z|x)$.