What is the best model for a recommendation system using implicit ratings?

I have a similariy matrix that looks like this:

I have a bunch of user vectors with 1s and 0s, with a 1 indicating that someone has clicked on an email (as part of a campaign) and zero to indicate they haven't. Implicit ratings as I have come to understand them.

In terms of an approach, after researching different options, it would appear that the best algorithm to start with a is a matrix-factorisation approach. My question is: what is the best algorithm to start as a baseline model for this type of problem? Any advice on potential pitfalls. Any help would be appreciated. Thanks!

Topic matrix-factorisation machine-learning-model python-3.x recommender-system machine-learning

Category Data Science


If you are after a baseline model, which I also strongly recommend, matrix-factorization (MF) approach (aka collaborative models) you mentioned is the most basic one and super fast and easy to implement. Then you can explore others like content-based models or hybrid recommenders, or even the recent ones based on the self-attention mechanism that Media mentioned. Doing this would give you something to compare with during A/B tests, if there is nothing is in place already. Just note that with the traditional MF may run into memory/performance issues if your matrix get very big (well, if your resources are limited!), and for that Deep Matrix Factorization are a better equivalent choice.

I have summarized almost a year ago about these three starting recommendation engines, see this answer. I have code snippers/notebooks gathered from references there for practical implementation of these few baseline that might be able to share if needed.


There are different approaches for your case, but I believe you can employ approaches that are based on the self-attention mechanism. There numerous studies that show they perform better than the latent outcomes that can be achieved with SVD. For instance, you can see the paper of SAIN which its code is also open source. In the paper, you can see that they have shown their approach has better performance than SVD and usual self-attention mechanisim. By the way, it is based on self-attention.

If I want to summarize the method, it uses the contents of the products and the items and finds the interactions between the words, and it outputs the score that a user may assign to a product. It has good performance.

Due to the fact that you have the contents of mails, this approach may help you.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.