Using BERT instead of word2vec to extract most similar words to a given word

Question

Using BERT instead of word2vec to extract most similar words to a given word

Maitha Alnaqbi

2022年6月2日 19:59

I am fairly new to BERT, and I am willing to test two approaches to get the most similar words to a given word to use in Snorkel labeling functions for weak supervision.

Fist approach was to use word2vec with pre-trained word embedding of word2vec-google-news-300 to find the most similar words

@labeling_function()
def lf_find_good_synonyms(x):
  good_synonyms = word_vectors.most_similar(good, topn=25) ##Similar words are extracted here
  good_list = syn_list(good_synonyms) ##syn_list just returns the stemmed similar word
  return POSITIVE if any(word in x.stemmed for word in good_list) else ABSTAIN

But since word2vec is an old approach, I wanted to test BERT to do the same, is it possible to do so?

Topic snorkel bert word2vec nlp

Category Data Science

Using BERT instead of word2vec to extract most similar words to a given word

About