Are there any good NLP APIs for comparing strings in terms of semantic similarity?

I want to create a chatbot which informs the user about traffic at the streets but not in real-time for the moment. I have created a small database with MySQL which has some data stored regarding traffic and I fetch them with a PHP script whenever this is appropriate depending on the interaction of the user with the chatbot.

I wonder how to deal with the case when the user asks variations of the same question which therefore can be answered with the same answer. For example:

  • Why is there traffic at High Street?
  • What is the cause of traffic at High Street?
  • Why did I encounter traffic at High Street?
  • I am stuck in traffic at High Street. Why is this?

Obviously, I can start by removing stopwords (e.g. did), by naming entities (e.g. road -> High Street), by defining synonyms and by applying a text similarity measure (e.g. Levenshtein distance etc).

However, I feel like reinventing the wheel if I do this. Therefore, my question is:

Are there any APIs which can compare strings in terms of semantic similarity (without even requiring training)?

I know that there software platforms such as Dialogflow which are suitable for these tasks but still you must explicitly state all the variations of the same question so that you will get the same answer. Therefore, I look for a API where you will explicitly state only one of these variations of the same question (e.g. Why is there traffic at High Street?) and then the API will figure out by itself which other variations are identical to it in terms of meaning or not.

Topic software-recommendation nlp python similarity machine-learning

Category Data Science


I can give you some hint of doing so with deep learning approaches.

It's easy to use gensim and sklearn python libraries. First, you need to extract the word embeddings which are vector of numbers to represent a word, and then take the average of the words within a sentence is a way of fining that vector representation for your sentence.

So extract your word embeddings using this guideline here. After that try cosine similarity using sklearn to compare how closely they are realted.


You can use the Universal Sentence Encoder from Google and calculate the similarity between texts using the cosine similarity or angular distance between their vector representations.


What you are looking for I guess is Semantic Similarity, you can try it from spacy here, otherwise you can even go with cosine similarity from sklearn.

Hope this helps. If anyone finds any corrections or other suggestions. I'd be happy to be corrected.


Use LSA (Latent Semantic Analysis) Algorithm for semantic meaning similarity.. It will be useful for your requirement

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.