Semantic Search
There is a problem we are trying to solve where we want to do semantic search on our set of data, i.e we have a domain specific data (example: sentences talking about automobiles)
Our data is just a bunch of sentences and what we want is to give a phrase and get back the sentences which are:
- Similar to that phrase
- Has a part of sentence that is similar to the phrase
- Sentence which is having contextually similar meanings
Let me try giving you an example suppose I search for the phrase "Buying Experience", I should get the sentences like:
I never thought car buying could take less than 30 minutes to sign and buy.
I found a car that i liked and the purchase process was straightforward and easy
I absolutely hated going car shopping, but today i’m glad i did
I want to lay emphasis on the fact that we are looking for contextual similarity and not just a brute force word search.
If the sentence uses different words then also it should be able to find it.
Things that we have already tried:
Open Semantic Search (https://www.opensemanticsearch.org/) the problem we faced here is generating ontology from the data we have, or for that sake searching for available ontology from different domains of our interest.
Elastic Search(BM25 + Vectors(tf-idf)), we tried this where it gave a few sentences but precision was not that great. The accuracy was bad as well. We tried against a human curated dataset, it was able to get around 10% of the sentences only.
We tried different embeddings like the once mentioned in https://github.com/UKPLab/sentence-transformers and also went through the example https://github.com/UKPLab/sentence-transformers/blob/master/examples/application_semantic_search.py and tried evaluating against our human curated set and that also had a very low accuracy.
We tried ELMO(https://towardsdatascience.com/elmo-contextual-language-embedding-335de2268604) this was better but still lower accuracy than we expected and there is a cognitive load to decide the cosine value below which we shouldn't consider the sentences. This even apply to point 3.
Any help will be appreciated. Thanks a lot for the help in advance