Dealing with missing data in SVD

I am a newbie to machine learning and I am trying to apply the SVD on the movielens dataset for movie recommendation. I have a movie-user matrix where the row is the user id, the column is the movie id and the value is the rating. Now, I would like to perform normalization on the movie-user matrix (subtract the data by users ratings mean). Then pass the normalized matrix to Scipy.sparse svds as follow: from scipy.sparse.linalg import svds U, sigma, …
Category: Data Science

How to estimate missing values when calculating NDCG

I would like to compare recommendations methods using NDCG metric on MovieLens dataset. In ranking problem, the goal is to rank items based on their relevance for user. Ranking models can be learned based on ratings matrix, where each user rates small subset of all items. Ratings for other items are unknown. Collaborative Filtering methods can be used to create model which generalize training datasets and predict ratings for unrated items. Let's consider following example on dataset consisted of 5 …
Category: Data Science

Dot product of two matrices in NLP how can i get this error be solved

from sklearn.metrics.pairwise import linear_kernel sim_matrix = linear_kernel(tfidf_matrix, tfidf_matrix) when I try to get dot product I am getting this errro MemoryError Traceback (most recent call last) <ipython-input-19-2c4d43d4a89e> in <module> 1 from sklearn.metrics.pairwise import linear_kernel ----> 2 sim_matrix = linear_kernel(tfidf_matrix, tfidf_matrix) ~\anaconda3\lib\site-packages\sklearn\metrics\pairwise.py in linear_kernel(X, Y, dense_output) 1002 """ 1003 X, Y = check_pairwise_arrays(X, Y) -> 1004 return safe_sparse_dot(X, Y.T, dense_output=dense_output) 1005 1006 ~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs) 70 FutureWarning) 71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)}) ---> 72 return f(**kwargs) …
Category: Data Science

memory error in matrix cosine_similarity

I have (20905040, 7) of a dataset to recommend 10 different product to the user it could be larger than that but anyway I got memory error when processing the cosine_sim = cosine_similarity(normalized_df,normalized_df) --------------------------------------------------------------------------- MemoryError Traceback (most recent call last) in 1 get_ipython().run_line_magic('time', '') ----> 2 cosine_sim = cosine_similarity(normalized_df,normalized_df) ~/venv/lib/python3.6/site-packages/sklearn/metrics/pairwise.py in cosine_similarity(X, Y, dense_output) 1034 1035 K = safe_sparse_dot(X_normalized, Y_normalized.T, -> 1036 dense_output=dense_output) 1037 1038 return K ~/venv/lib/python3.6/site-packages/sklearn/utils/extmath.py in safe_sparse_dot(a, b, dense_output) 140 return ret 141 else: --> 142 return …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.