Difference between Doc2Vec and BERT

I am trying to understand the difference between Doc2Vec and BERT. I do understand that doc2vec uses a paragraph ID which also serves as a paragraph vector. I am not sure though if that paragraph ID serves in better able to understand the context in that vector?

Moreover, BERT definitely understands the context and attributes different vectors for words such as Bank. for instance,

  1. I robbed a bank
  2. I was sitting by the bank of a river.

BERT would allocate different vectors for the word BANK here. Trying to understand if doc2vec also gets this context since the paragraph id would be different here (for doc2vec). Can anyone please help with this?

Topic doc2vec bert transformer nlp machine-learning

Category Data Science


The main difference it that BERT includes attention mechanisms, whereas Doc2Vec doesn't.

Attention mechanisms are functions to detect context between words, i.e. learning from words positions using attention weights.

This gives a better result than classic embedding approaches like Doc2Vec, thanks to a contextual approach of data.

On the other hand, BERT can handle out of vocabulary words because it uses subwords (example: "sub" + "word" + "s") instead of complete words (ex: "subwords"), which gives more meaningful information about the data.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.