When would you use word2vec over BERT?
I am very new to Machine Learning and I have recently been exposed to word2vec and BERT.
From what I know, word2vec provides a vector representation of words, but is limited to its dictionary definition. This would mean the algorithm may output the unwanted definition of a word with multiple meanings.
BERT on the other hand, is able to use context clues in the sentence to describe the true meaning of the word.
To me, it sounds like BERT would always be the better choice when it comes to identifying the definition of a word.
Could someone explain when word2vec would be more useful?
Category Data Science