snorkel

Using BERT instead of word2vec to extract most similar words to a given word

Maitha Alnaqbi

2022年6月2日 19:59

I am fairly new to BERT, and I am willing to test two approaches to get "the most similar words" to a given word to use in Snorkel labeling functions for weak supervision. Fist approach was to use word2vec with pre-trained word embedding of "word2vec-google-news-300" to find the most similar words @labeling_function() def lf_find_good_synonyms(x): good_synonyms = word_vectors.most_similar("good", topn=25) ##Similar words are extracted here good_list = syn_list(good_synonyms) ##syn_list just returns the stemmed similar word return POSITIVE if any(word in x.stemmed for …

What if my Snorkel labeling function has a very low coverage over a development set?

Anjani Anjani

2022年1月23日 21:46

I am trying to label a span recognition dataset using Snorkel and am currently at the stage of improving labeling functions. One of the LF has a rather low coverage because it only labels a subclass of one of the entity spans. What would be the impact of low coverage labeling functions on the final downstream span recognition model?

Using BERT instead of word2vec to extract most similar words to a given word

What if my Snorkel labeling function has a very low coverage over a development set?

About