Can I sum up feature vectors of a user‘s collection?

I want to find items that are similar to items users already have in their collection. Every item has attributes, so I created feature vectors where every element of the vector represents an attribute and is either $0$ or $1$ (if an item has that attribute).

For the user collection I summed up all vectors, creating one vector which I then used to calculate similarities with other items.

Is this a correct approach or should I make this "user vector", binary like the other ones? Or is it easier to just calculate $n \times m$ (I.e. user items and new items) similarities?

The set of new items will consist of $\sim1000$ items, while the user collections tend to be $1000$. As similarity function I used cosine distance, but wanted to try Pearson coefficient as well.

Topic vector-space-models cosine-distance similarity recommender-system

Category Data Science


You can use total sum of boolean values. That will be fast and give a general notion of similarity.

A more useful metric might be Hamming distance, the sum of matching booleans between two vectors.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.