Document Similarity with User Preference
To measure the similarity between two documents, one can use, e.g. TF-IDF/Cosine Similarity. Supposing that after calculating the similarity scores of Doc A
against a list of Documents (Doc B
, Doc C
,...), we got:
Document Pair | Similarity Score |
---|---|
Doc A vs. Doc B | 0.45 |
Doc A vs. Doc C | 0.30 |
Doc A vs. ... | ... |
Of course, Doc B
seems to be the closest one, in terms of similarity, for Doc A
. But what if Users, as humans, think Doc C
should be chosen as the closest? That said, how can we factor in the User Preference, so that later if the Users run the algorithm again, the score of Doc A vs. Doc C
will be higher than Doc A vs. Doc B
? Put it simply, beside the calculation of TF-IDF/Cosine Similarity, the algorithm also takes into account the User's history of choices, and suggests the Doc
that satisfies that specific User the most.
I'm open to any techniques, apart from TF-IDF/Cosine Similarity. It would be great if there were also some readily available implementation, e.g. in Python.
Topic semantic-similarity similar-documents cosine-distance tfidf similarity
Category Data Science