Document matching with more priority to certain features than others

I am working on recommendation systems wherein I need to match the similarity of 2 users. Now, I know that I can use Tfidf vectorizer to calculate the the cosine similarity score between them. But, now suppose I have some features where I have different priorities for those features. So, for each feature there will be a different priority and the one with with higher priority will be checked first. So, when I get cosine similarity based on that feature, I will move on to the next feature and so on. How can I achieve this?

Topic cosine-distance recommender-system

Category Data Science


If you don't want all features to have the same weight in the cosine similarity you could just use weighted cosine similarity.

Cosine similarity is calculated as

$similarity = \frac{\vec A\cdot \vec B}{\Vert A\Vert\cdot\Vert B\Vert}$

Now, if you multiply each component $a_i$ of $\vec A$ and $b_i$ of $\vec B$ with a weight $w_i$ you will get a weighted cosine similarity.

Which weights you should use depend on the application. If you have a ranking of your features $\vec R$ you could give $\frac{1}{r_i}$ or $\frac{1}{r^2_i}$ a try. The feature with rank 1 will have a weight of 1 in both cases. In principle any measure of feature importance will do as weights.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.