Word2Vec vs. Doc2Vec Word Vectors

Question

Word2Vec vs. Doc2Vec Word Vectors

Tylerr

2021年5月22日 00:12

I am doing some analysis on document similarity and was also interested in word similarity. I know that doc2vec inherits from word2vec and by default trains using word vectors which we can access.

My question is:

Should we expect these word vectors and by association any of the methods such as most_similar to be 'better' than word2vec or are they essentially going to be the same? If in the future I only wanted word similarity should I just default to word2vec?

Topic doc2vec word2vec nlp

Category Data Science

Brian Spiering · Accepted Answer · 2021年5月22日 00:12

If you only care about word similarity, then apply Occam's Razor and use word2vec. There is no need to increase model complexity if not going to be used.

Also, the quality of embeddings is primarily increased through the size and diversity of the corpus. The algorithm has a much smaller effect on the quality of the embedding.

Word2Vec vs. Doc2Vec Word Vectors

About