How to Calculate semantic similarity between video captions?
I intend to calculate the accuracy of a caption generated by comparing it to a number of reference sentences.
For example, the captions for one video are as follows: These captions are for the same video only. However, reference sentences have been broken down with respect to different segments of a video.
Reference sentences (R):
A man is walking along while pushing his bicycle.
He tries to balance himself by taking support from a pole.
Then he falls on the sidewalk along with the pole and the bicycle with him.
Candidate Caption generated (C):
A person is trying to use a pole to push off his bike ride but ends up falling down.
I want to calculate a similarity score between each pair.
That is, (R1,C), (R2, C) and (R3, C)
What is the best method?
I tried using TF-IDF and then Cosine similarity. However, that only got the word matching. I want lexical and semantic accuracy between these sentences to estimate how accurately the sentence C has been written.
You can refer the code I have done till now here
I understand I need to tokenize, do word embedding, semantic analysis and then some similarity metric but not sure? In which order and which algorithm is best suited for which?
Topic semantic-similarity tokenization word-embeddings nlp python
Category Data Science