Document Similarity with User Preference

Question

Document Similarity with User Preference

JoyfulPanda

2022年3月17日 02:30

To measure the similarity between two documents, one can use, e.g. TF-IDF/Cosine Similarity. Supposing that after calculating the similarity scores of Doc A against a list of Documents (Doc B, Doc C,...), we got:

Document Pair	Similarity Score
Doc A vs. Doc B	0.45
Doc A vs. Doc C	0.30
Doc A vs. ...	...

Of course, Doc B seems to be the closest one, in terms of similarity, for Doc A. But what if Users, as humans, think Doc C should be chosen as the closest? That said, how can we factor in the User Preference, so that later if the Users run the algorithm again, the score of Doc A vs. Doc C will be higher than Doc A vs. Doc B? Put it simply, beside the calculation of TF-IDF/Cosine Similarity, the algorithm also takes into account the User's history of choices, and suggests the Doc that satisfies that specific User the most.

I'm open to any techniques, apart from TF-IDF/Cosine Similarity. It would be great if there were also some readily available implementation, e.g. in Python.

Topic semantic-similarity similar-documents cosine-distance tfidf similarity

Category Data Science

Erwan · Accepted Answer · 2022年3月17日 02:30

These are two different things:

document similarity is based only on the documents
previous users choices can be used to train a recommender system, or simply applied in a rule-based fashion.

The two can be combined in a user-specific recommender system, but mind that the choices of the users are not necessarily consistent, not even for a single user. This is why it cannot be assumed that the selected choices mean "more similar documents".

Document Similarity with User Preference

About