scalable tools to build kNN graph over sparse data
I'm looking for scalable tools to build kNN graph over sparse data points.
The dimension and number of data points can be both up to millions.
What I have tried already:
- sklearn.neighbors.kneighbors_graph: which does brute-force search for sparse data, giving quadratic time.
- flann: only supports dense arrays
- pysparnn: the running time is not very satisfatory (maybe because it's written in Python)
- knn search in mlpack: which only supports dense data
- scipy.spatial.KDTree: which converts the sparse data to dense one
- SparseLSH: which is implemented in Python, so I'm not quite sure about the scalability
- elasticsearch: it seems to only support indexing documents, instead of sparse features.
- the reason I thought of elasticsearch is: knn over sparse data can be framed as retrieving the top-k "documents" in IR.
Thanks for any comments/answers :)
Topic k-nn search-engine data-mining machine-learning
Category Data Science