understanding the factorisation machine formula

I am reading this tutorial about factorisation machines. I get the intuition behind it, compute the dot product between the (user/item)+(item/aux features)+(user/aux features). This dot product can impact y_hat. But I don't understand the formula below. I understand the first section, the bias. I understand the second section, first order weighting. But I don't understand the last section. I understand the <v,v>, this could be dot product of <user,item> or <item, aux feature>. But what are the x's? Would these …
Category: Data Science

Why is using sigmoid in last layer of NN for General Matrix Factorisation showing better performance?

My understanding is that the example shown in Keras documentation https://keras.io/examples/structured_data/collaborative_filtering_movielens/ is the special case of a Neural Collaborative Filter called "General Matrix Factorisation" in this paper https://dl.acm.org/doi/abs/10.1145/3038912.3052569 . Keras example normalises the ratings, uses sigmoid activation in the last layer and log loss. Converting the predictions back to ratings gives RMSE of just below 1. One alternative I tried is to not normalise the rating, don't use an activation function and use mean-squared-error loss. This method gave RMSE of …
Category: Data Science

Extension of NMF to 3D

AFAIK, Non-Negative Matrix Factorization (NMF) is the procedure of looking for matrices $A$ and $B$ such that $$Data_{ik} = \sum_j A_{ij} B_{jk}$$ My data matrix is in fact 3D. I would like to fit the following model to my data $$Data_{ikl} = \sum_j A_{ij} B_{jk} C_{jl}$$ It would be great to know if this model has a name and is already implemented somewhere (preferably Python)
Category: Data Science

Matrix factorization how to initialize weights and biases?

I have a matrix factorization and I'm wondering how I should initialize its weights and biases. When getting prediction (recommendation), after computing a dot product and adding bias I want to use sigmoid function on that to get value from 0 to 1. But when introducing a sigmoid here I also introduce a possibile vanishing/exploding gradient problem. For that I think that weights can be initialized with xavier function. But what aboud biases? Should I just use uniform distribution from …
Category: Data Science

Matrix Factorisation Improvement

I am using SGD matrix factorisation (python) using the movielens dataset to make recommendations. I have a website which allows users to give feedback which is positive or negative to whether an item is a good recommendation for a particular movie. I was wondering if I could use this feedback in my matrix factorisation. I wasn't 100% sure how I would proceed. So for example I would have a vector like m1 m2 m3 m1 0 0 0 m2 5 …
Category: Data Science

Differences and similarities between nonnegative PCA and nonnegative matrix factorization

I have seen references in the literature to nonnegative principal component analysis (nPCA) and nonnegative matrix factorization (NMF). I have tried reading the papers on both of them but it is not clear to me what the differences and similarities between them are. By similarity, I mean I am also interested in knowing when the nPCA and NMF method will give the same solution. Can someone clarify this?
Category: Data Science

How do the authors get this updating formula for all $\beta$ in $\beta$-divergence

I'm reading the paper Algorithms for nonnegative matrix factorization with the β-divergence by Cédric Févotte and Jérôme Idier. Package scikit-learn uses their algorithm for module sklearn.decomposition.NMF. In section 4.1, they said An MM algorithm can be derived by minimizing the auxiliary function $G(\mathbf{h} \mid \tilde{\mathbf{h}})$ w.r.t to $\mathbf{h}$. Given the convexity and the separability of the auxiliary function the optimum is obtained by canceling the gradient given by Eq. (36). This is trivially done and leads to the following update: …
Category: Data Science

Treating recommender systems as multiclass classification or binary classification problem

I'm thinking about the two following approaches for building a recommender system to recommend products using implicit data as a classifier: Treat it as a multi-class classification problem. The features of the model are the user features and the target is the item. This is the approach used in this Google documentation. Treat it as a binary classification problem. The features of the model are the user and item features, and the target variable is a binary variable indicating whether …
Category: Data Science

Advantages of matrix factorization when the number of products is low

I'm building a recommender system where the number of products is rather low (around 50), and we can assume it'll stay the same for a long time. I'm looking at two different way of tackling the problem: Using a matrix factorization technique. Treating it as a multi-class classification problem with a target of 50 different possible values. The features I'm using are the ones that the matrix factorization technique uses implicitly: Number of times a user has bought product 1. …
Category: Data Science

How do I recommend items to out of training users based on its recent views?

I used Spark's ALS implementation of matrix factorization (Collaborative Filtering for Implicit Feedback) to train user and item embeddings. Since we have a lot of users in system, I had to sample some users to train model to avoid overfitting. Now how do I construct user embeddings for out of training users. I tried constructing user embeddings by averaging item embeddings for user's items. But when I compared performance of average vector vs original user embeddings, it is not that …
Category: Data Science

Why do we need 2 matrices for word2vec or GloVe

Word2vec and GloVe are the two most known words embedding methods. Many works pointed that these two models are actually very close to each other and that under some assumptions, they perform a matrix factorization of the ppmi of the co-occurrences of the words in the corpus. Still, I can't understand why we actually need two matrices (and not one) for these models. Couldn't we use the same one for U and V ? Is it a problem with the …
Category: Data Science

Calculate implicit rating from streaming behaviour for Recommendation Engine

I have a dataset containing some user streams data for particular videos like below: u_id|start_stream_time_dt|watch_time_ms|video_category 1| 2021-02-01 | 3600 | Live My goal is to build a recommender system for watch streams. However, I would like to find the optimal watch_stream threshold (or other approaches) that would allow me to define if a user has indeed watched a video because he/she's interested. In other words I'd like to fill the 1s in the user_item matrix based on these information I …
Category: Data Science

Non-negative Matrix Factorization for clustering

I'm learning to user NMF to do clustering. Based on the reading What is a good explanation of Non Negative Matrix Factorization? and https://iksinc.online/2016/03/21/what-is-nmf-and-what-can-you-do-with-it/. The first link mention for data preprocessing, normalization is necessary. My question is if we do a normalization for the features there will be negative values in the data. Isn't non-negative necessary for this method? For NMF,https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html , it seems you still need to define n_component before hand. Does it depend on domain knowledge or any …
Category: Data Science

Temporal train test split for recommender systems

When evaluating a collaborative filtering recommender system, it is practical to split the data temporally. However, by doing so, some users might be present in only either of the train or test set. For example, consider the below example: user year 0 2020 0 2020 0 2021 1 2021 1 2021 1 2021 2 2020 2 2021 2 2021 If we decide to split by year such that ratings after 2020 will be in the test set, then: Train user …
Category: Data Science

Negative Latent Factors in Factorized Machines

I'm studing a specific implementation of a recommendation system leveraging on a factorization machine algorithm. For each person_id and item_id combination, I have an implicit rating of 1 or 0 depending on if the user downloaded the content or not. In the base model, I have just utilized as input variables the person_id and the item_id. I selected a latent factor number equal to 5. In the model output, some of the 5 the latent factors associated to some person_id …
Category: Data Science

What is the best model for a recommendation system using implicit ratings?

I have a similariy matrix that looks like this: I have a bunch of user vectors with 1s and 0s, with a 1 indicating that someone has clicked on an email (as part of a campaign) and zero to indicate they haven't. Implicit ratings as I have come to understand them. In terms of an approach, after researching different options, it would appear that the best algorithm to start with a is a matrix-factorisation approach. My question is: what is …
Category: Data Science

How to filter Items in Recommender Systems?

I have a Recommender System which recommends Articles based on Similarity from 3 Features, "Page-Title, Article Content, Tags". But some of the Articles are NSFW(Related to Adult Topics). I want to keep these Recommendations separate from the normal ones? Any idea about how do I go ahead with it, I was thinking about keeping the adult articles separate and using another metric to recommend them, but I do not feel its right as it would create a whole different Recommender …
Category: Data Science

what is the meaning of $\mathbb{R}^{768\times (768 * 2)}$?

Hi I'm an undergraduate student interested in Machine Learning. I was reading a paper from ICLR 2020 and came a cross a weird looking vector dimensions. Can anyone tell me what this means?? $\mathbb{R}^{768\times (768 * 2)}$ Does this mean that in python numpy array the shape would probably be (2, 768, 768) ?? I remember reading that the numpy array dimensions are reversed from the actual vector dimensions representations. And the vector I asked about shows up in page …
Category: Data Science

What types of matrix multiplication are used in Machine Learning? When are they used?

I'm looking at equations for neural networks and backpropagation and I see this symbol in the equations, ⊙. I thought matrix multiplication of neural networks always involved matrices that matched dimensions on both sides, such as... [3, 3]@[3, 2]. (This is what is happening in the animated gif). What part of a neural net uses a Hadamard product and which uses the Kronecker product? Because I see this notation for the Hadamard product (⊙) in papers and deep learning learning …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.