How to Calculate semantic similarity between video captions?

I intend to calculate the accuracy of a caption generated by comparing it to a number of reference sentences.

For example, the captions for one video are as follows: These captions are for the same video only. However, reference sentences have been broken down with respect to different segments of a video.

Reference sentences (R):

A man is walking along while pushing his bicycle.
He tries to balance himself by taking support from a pole.
Then he falls on the sidewalk along with the pole and the bicycle with him.

Candidate Caption generated (C):

A person is trying to use a pole to push off his bike ride but ends up falling down.

I want to calculate a similarity score between each pair. That is, (R1,C), (R2, C) and (R3, C)

What is the best method?

I tried using TF-IDF and then Cosine similarity. However, that only got the word matching. I want lexical and semantic accuracy between these sentences to estimate how accurately the sentence C has been written.

You can refer the code I have done till now here

I understand I need to tokenize, do word embedding, semantic analysis and then some similarity metric but not sure? In which order and which algorithm is best suited for which?

Topic semantic-similarity tokenization word-embeddings nlp python

Category Data Science


There's not exactly best solution, mostly depending on the experiment results. So maybe you need to try several different approaches to find out the best way. But one common way (to my knowledge) is using a word embedding (add an embedding layer or use GloVe, BERT etc.) to get the word vector, and then concatenate the word vectors in each sentence into sentence representation. You can calculate the similarity between the sentence representation.

Well actually I don't practically test this :p, but I have read it in the papers. Therefore this approach is for reference only.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.