Reducing dataset before computing similarity matrix

I'm writing my thesis and am trying to calculate a similarity matrix of houses. I currently have a dataset of 500,000 houses that I need to calculate the similarity between. I.e. I need to calculate 500,000 * 500,000 cells, and R cannot handle that, I am getting the error:

Error: cannot allocate vector of size 2664.8 Gb

Does anyone know a smart solution to this problem?

Topic matrix memory similarity

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.