Document Similarity with User Preference

To measure the similarity between two documents, one can use, e.g. TF-IDF/Cosine Similarity. Supposing that after calculating the similarity scores of Doc A against a list of Documents (Doc B, Doc C,...), we got:

Document Pair Similarity Score
Doc A vs. Doc B 0.45
Doc A vs. Doc C 0.30
Doc A vs. ... ...

Of course, Doc B seems to be the closest one, in terms of similarity, for Doc A. But what if Users, as humans, think Doc C should be chosen as the closest? That said, how can we factor in the User Preference, so that later if the Users run the algorithm again, the score of Doc A vs. Doc C will be higher than Doc A vs. Doc B? Put it simply, beside the calculation of TF-IDF/Cosine Similarity, the algorithm also takes into account the User's history of choices, and suggests the Doc that satisfies that specific User the most.

I'm open to any techniques, apart from TF-IDF/Cosine Similarity. It would be great if there were also some readily available implementation, e.g. in Python.

Topic semantic-similarity similar-documents cosine-distance tfidf similarity

Category Data Science


These are two different things:

  • document similarity is based only on the documents
  • previous users choices can be used to train a recommender system, or simply applied in a rule-based fashion.

The two can be combined in a user-specific recommender system, but mind that the choices of the users are not necessarily consistent, not even for a single user. This is why it cannot be assumed that the selected choices mean "more similar documents".

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.