Why is using sigmoid in last layer of NN for General Matrix Factorisation showing better performance?

My understanding is that the example shown in Keras documentation https://keras.io/examples/structured_data/collaborative_filtering_movielens/ is the special case of a Neural Collaborative Filter called General Matrix Factorisation in this paper https://dl.acm.org/doi/abs/10.1145/3038912.3052569 .

Keras example normalises the ratings, uses sigmoid activation in the last layer and log loss. Converting the predictions back to ratings gives RMSE of just below 1.

One alternative I tried is to not normalise the rating, don't use an activation function and use mean-squared-error loss. This method gave RMSE of around 2.

The way the algorithm works is it takes the dot product of user and item embeddings to calculate cosine similarity, and uses gradient descent to find the optimal embeddings based on the loss function. My question is - how does Keras example find better embeddings over my alternative?

Topic matrix-factorisation

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.