What is the difference between Okapi bm25 and NMSLIB?
I was trying to make a search system and then I got to know about Okapi bm25
which is a ranking function like tf-idf. You can make an index of your corpus and later retrieve documents similar to your query.
I imported a python library rank_bm25
and created a search system and the results were satisfying.
Then I saw something called Non-metric space library. I understood that its a similarity search library much like kNN algorithm.
I saw an example where a guy was trying to make a smart search system using nmslib
. He did the following things:-
- tokenized the documents
- pass the tokens into
fastText
model to create word vectors - then combined those word vectors with bm25 weights
- then passed the combination into nmslib
- performed the search.
If the above link does not opens the document just open it in incognito mode.
It was quite fast, but the results were not satisfying, I mean even if I was copy pasting any exact query from the doc, it was not returning that doc. But the search system that I made using rank_bm25 was giving great results. So the conclusion was
bm25
gave good results and nmslib
gave faster results.
My questions are
- How do they both (bm25, nmslib) differ?
- How can I pass bm25 weights to nmslib to create a better and faster search engine?
- In short, how can I combine the goodness of both bm25 and nmslib?
Topic search-engine python-3.x nlp information-retrieval
Category Data Science