Weighting Sentence Similarity by salience or frequency

It seems like the new standard in text search is sentence or document similarity, using things like BERT sentence embeddings. However, these don't really have a way to consider the salience of sentences, which can make it hard to compare different searches.

For example, when using concept embeddings I'd like to be able to score Exam - Exam as less important than Diabetes - High blood sugar. But obviously, the former has a similarity score of 1.

I've tried using weighting with inverse document frequency of terms, but with the latter being unbounded, it's really hard to figure out how to weight similarity and frequency scores. It's also just not as sophisticated, since it doesn't account for synonymy.

Has anybody come up with any solutions to this?

Topic semantic-similarity bert word-embeddings nlp search

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.