Best method for similarity searching on 10,000 data points with 8,000 features each in Python?
As mentioned in the title I am attempting to search through 10,000 vectors with 8000 features each, all in Python. Currently I have the vectors saved in their own directories as pickled numpy arrays. The features were pulled from this deep neural network.
I am new to this space, but I have heard about M-Trees, R-Trees, inverted tables, and hashing. Is any one of them better for such a large number of features?
This implementation needs to be done quite rapidly and is just a prototype, so simplicity is valuable.
Thank you for your help.
Topic search-engine python search
Category Data Science