Reducing dataset before computing similarity matrix
I'm writing my thesis and am trying to calculate a similarity matrix of houses. I currently have a dataset of 500,000 houses that I need to calculate the similarity between. I.e. I need to calculate 500,000 * 500,000 cells, and R cannot handle that, I am getting the error:
Error: cannot allocate vector of size 2664.8 Gb
Does anyone know a smart solution to this problem?
Topic matrix memory similarity
Category Data Science