Weighting Sentence Similarity by salience or frequency

Question

Weighting Sentence Similarity by salience or frequency

Aaron Casey

2022年3月21日 23:20

It seems like the new standard in text search is sentence or document similarity, using things like BERT sentence embeddings. However, these don't really have a way to consider the salience of sentences, which can make it hard to compare different searches.

For example, when using concept embeddings I'd like to be able to score Exam - Exam as less important than Diabetes - High blood sugar. But obviously, the former has a similarity score of 1.

I've tried using weighting with inverse document frequency of terms, but with the latter being unbounded, it's really hard to figure out how to weight similarity and frequency scores. It's also just not as sophisticated, since it doesn't account for synonymy.

Has anybody come up with any solutions to this?

Topic semantic-similarity bert word-embeddings nlp search

Category Data Science

Weighting Sentence Similarity by salience or frequency

About