Treating Word Embeddings as Multivariate Gaussian Random Variables
I want to specify some probabilistic clustering model (such as a mixture model or lda) over words, and instead of using the traditional method of representing words as an indicator vector , I want to use the corresponding word embeddings extracted from word2vec, glove, etc. as input.
While treating word embeddings from my word2vec as an input to my GMM model, I observed that my word embeddings for each feature had a normal distribution, i.e. feature 1..100 were normally distributed for my word dictionary. Can anyone tell how that is true? In my understanding, they are word embeddings are model weights attributed from a shallow neural network. Are they always supposed to be normally distributed?
Furthermore, when using doc2vec word embeddings,my features were uniformly distributed? This goes against the earlier assertion that word embeddings are normally distributed. Can anyone explain this discrepancy?
Topic doc2vec gmm word2vec nlp machine-learning
Category Data Science