Document matching with more priority to certain features than others

Question

Document matching with more priority to certain features than others

Reckoner

2022年3月18日 02:03

I am working on recommendation systems wherein I need to match the similarity of 2 users. Now, I know that I can use Tfidf vectorizer to calculate the the cosine similarity score between them. But, now suppose I have some features where I have different priorities for those features. So, for each feature there will be a different priority and the one with with higher priority will be checked first. So, when I get cosine similarity based on that feature, I will move on to the next feature and so on. How can I achieve this?

Topic cosine-distance recommender-system

Category Data Science

akode · Accepted Answer · 2019年6月25日 11:45

If you don't want all features to have the same weight in the cosine similarity you could just use weighted cosine similarity.

Cosine similarity is calculated as

$similarity = \frac{\vec A\cdot \vec B}{\Vert A\Vert\cdot\Vert B\Vert}$

Now, if you multiply each component $a_i$ of $\vec A$ and $b_i$ of $\vec B$ with a weight $w_i$ you will get a weighted cosine similarity.

Which weights you should use depend on the application. If you have a ranking of your features $\vec R$ you could give $\frac{1}{r_i}$ or $\frac{1}{r^2_i}$ a try. The feature with rank 1 will have a weight of 1 in both cases. In principle any measure of feature importance will do as weights.

Document matching with more priority to certain features than others

About