Which phrase should be returned in case of multiple matches when comparing text?
I want to compare one sentence to some other sentences using the Bag of Words model. Suppose that my comparing sentence is:
I am playing football
and there are three more sentences that I want to compare my comparing sentence with. They are:
1. and I am playing Cricket
2. Why do you play Cricket
3. I love playing Cricket when I am at school
Now, if I compare my comparing sentence to the above three sentences by counting words, the number 1 and number 2 sentences have the same number of words that the comparing sentence has. and that is 3 (I, am , playing).
Now the question is, Which sentence is more related to my comparing sentence in this case? there are no semantic meanings involved at all.
In some places I saw, they say, it is less convoluted to return the shortest sentence in this case. What are your thoughts?
Topic bag-of-words text text-mining
Category Data Science