How to choose similarity measurement between sentences and paragraphs
Problems
1. How to find appropriate measurement method
There are several ways to measure sentence similarities, but I have no idea how to find appropriate method among them for my data (sentences).
Related Question on Stack overflow: is there a way to check similarity between two full sentences in python?
2. Sentence or paragraph based
If it is possible to acquire both one sentence and a paragraph which includes the sentence, which is more accurate to measure the similarity among sentences or paragraphs?
What I tried so far
1. I've tried to use one of the libraries to measure the similarity.
However, I'm struggling how to find more accurate method to measure similarities.
original = 'New York is a noisy city where hamburgers are famous.'
test = ['Berlin is a nostalgic city where sausages are famous.', 'Both New York and Belin are noisy cities, but hamburgers are famous in New York rather than in Berlin.']
import spacy
nlp = spacy.load(en_core_web_sm)
doc1 = nlp(original)
for doc2 in test:
doc2 = nlp(doc2)
print(doc1.similarity(doc2))
0.8682034221008
0.5078180005337849
Same as sentence based, it was figured out there are several methods to measure the similarity between paragraphs.
But there is no crew which is better (generally high-peformance) to compare sentence or paragraph base.
Related Question on Stack overflow: How to compute the similarity between two text documents?
Topic semantic-similarity nlp python similarity
Category Data Science