counter vector fit transform cosine similarity memory error
count_matrix = count.fit_transform(off_data3['bag_of_words'])
I have count_matrix shape with
count_matrix.shape (476147, 482824)
cosine_sim = cosine_similarity(count_matrix, count_matrix)
I think the matrix size is too big to cause this memory error
--------------------------------------------------------------------------- MemoryError Traceback (most recent call last) in
~/venv/lib/python3.6/site-packages/sklearn/metrics/pairwise.py in cosine_similarity(X, Y, dense_output) 1034 1035 K = safe_sparse_dot(X_normalized, Y_normalized.T, -> 1036 dense_output=dense_output) 1037 1038 return K
~/venv/lib/python3.6/site-packages/sklearn/utils/extmath.py in safe_sparse_dot(a, b, dense_output) 135 """ 136 if sparse.issparse(a) or sparse.issparse(b): --> 137 ret = a * b 138 if dense_output and hasattr(ret, "toarray"): 139 ret = ret.toarray()
~/venv/lib/python3.6/site-packages/scipy/sparse/base.py in mul(self, other) 479 if self.shape[1] != other.shape[0]: 480 raise ValueError('dimension mismatch') --> 481 return self._mul_sparse_matrix(other) 482 483 # If it's a list or whatever, treat it like a matrix
~/venv/lib/python3.6/site-packages/scipy/sparse/compressed.py in _mul_sparse_matrix(self, other) 514 maxval=nnz) 515 indptr = np.asarray(indptr, dtype=idx_dtype) --> 516 indices = np.empty(nnz, dtype=idx_dtype) 517 data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype)) 518
MemoryError:
Any tips to avoid this memory error when You have large matrix?
Topic data-analysis cosine-distance nlp machine-learning
Category Data Science