How to choose similarity measurement between sentences and paragraphs

Question

How to choose similarity measurement between sentences and paragraphs

Mahler

2022年2月7日 14:47

Problems

1. How to find appropriate measurement method

There are several ways to measure sentence similarities, but I have no idea how to find appropriate method among them for my data (sentences).

Related Question on Stack overflow: is there a way to check similarity between two full sentences in python?

2. Sentence or paragraph based

If it is possible to acquire both one sentence and a paragraph which includes the sentence, which is more accurate to measure the similarity among sentences or paragraphs?

What I tried so far

1. I've tried to use one of the libraries to measure the similarity.

However, I'm struggling how to find more accurate method to measure similarities.

original = 'New York is a noisy city where hamburgers are famous.'
test = ['Berlin is a nostalgic city where sausages are famous.', 'Both New York and Belin are noisy cities, but hamburgers are famous in New York rather than in Berlin.']

import spacy
nlp = spacy.load(en_core_web_sm)


doc1 = nlp(original)
for doc2 in test:
    doc2 = nlp(doc2)
    print(doc1.similarity(doc2)) 

0.8682034221008
0.5078180005337849

Same as sentence based, it was figured out there are several methods to measure the similarity between paragraphs.

But there is no crew which is better (generally high-peformance) to compare sentence or paragraph base.

Related Question on Stack overflow: How to compute the similarity between two text documents?

Topic semantic-similarity nlp python similarity

Category Data Science