Using BERT instead of word2vec to extract most similar words to a given word
I am fairly new to BERT, and I am willing to test two approaches to get the most similar words to a given word to use in Snorkel labeling functions for weak supervision.
Fist approach was to use word2vec with pre-trained word embedding of word2vec-google-news-300 to find the most similar words
@labeling_function()
def lf_find_good_synonyms(x):
good_synonyms = word_vectors.most_similar(good, topn=25) ##Similar words are extracted here
good_list = syn_list(good_synonyms) ##syn_list just returns the stemmed similar word
return POSITIVE if any(word in x.stemmed for word in good_list) else ABSTAIN
But since word2vec is an old approach, I wanted to test BERT to do the same, is it possible to do so?
Topic snorkel bert word2vec nlp
Category Data Science