Why Gaussian latent variable (noise) for GAN?

When I was reading about GAN, the thing I don't understand is why people often choose the input to a GAN (z) to be samples from a Gaussian? - and then are there also potential problems associated with this?

Topic gan gaussian deep-learning

Category Data Science


Why people often choose the input to a GAN (z) to be samples from a Gaussian?

Generally, for two reasons: (1) mathematical simplicity, (2) working well enough in practice. However, as we explain, under additional assumptions the choice of Gaussian could be more justified.

Compare to uniform distribution. Gaussian distribution is not as simple as uniform distribution but it is not that far off either. It adds "concentration around the mean" assumption to uniformity, which gives us the benefits of parameter regularization in practical problems.

The least known. Use of Gaussian is best justified for continuous quantities that are the least known to us, e.g. noise $\epsilon$ or latent factor $z$. "The least known" could be formalized as "distribution that maximizes entropy for a given variance". The answer to this optimization is $N(\mu, \sigma^2)$ for arbitrary mean $\mu$. Therefore, in this sense, if we assume that a quantity is the least known to us, the best choice is Gaussian. Of course, if we acquire more knowledge about that quantity, we can do better than "the least known" assumption, as will be illustrated in the following examples.

Central limit theorem. Another commonly used justification is that since many observations are the result (average) of large number of [almost] independent processes, therefore CLT justifies the choice of Gaussian. This is not a good justification because there are also many real-world phenomena that do not obey Normality (e.g. Power-law distribution), and since the variable is the least known to us, we cannot decide which of these real-world analogies are more preferable.

This would be the answer to "why we assume a Gaussian noise in probabilistic regression or Kalman filter?" too.

Are there also potential problems associated with this?

Yes. When we assume Gaussian, we are simplifying. If our simplification is unjustified, our model will under-perform. At this point, we should search for an alternative assumption. In practice, when we make a new assumption about the least known quantity (based on acquired knowledge or speculation), we could extract that assumption and introduce a new Gaussian one, instead of changing the Gaussian assumption. Here are two examples:

  1. Example in regression (noise). Suppose we have no knowledge about observation $A$ (the least known), thus we assume $A \sim N(\mu, \sigma^2)$. After fitting the model, we may observe that the estimated variance $\hat{\sigma}^2$ is high. After some investigation, we may assume that $A$ is a linear function of measurement $B$, thus we extract this assumption as $A = \color{blue}{b_1B +c} + \epsilon_1$, where $\epsilon_1 \sim N(0, \sigma_1^2)$ is the new "the least known". Later, we may find out that our linearity assumption is also weak since, after fitting the model, the observed $\hat{\epsilon}_1 = A - \hat{b}_1B -\hat{c}$ also has a high $\hat{\sigma}_1^2$. Then, we may extract a new assumption as $A = b_1B + \color{blue}{b_2B^2} + c + \epsilon_2$, where $\epsilon_2 \sim N(0, \sigma_2^2)$ is the new "the least known", and so on.

  2. Example in GAN (latent factor). Upon seeing unrealistic outputs from GAN (knowledge) we may add $\color{blue}{\text{more layers}}$ between $z$ and the output (extract assumption), in the hope that the new network (or function) with the new $z_2 \sim N(0, \sigma_2^2)$ would lead to more realistic outputs, and so on.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.