Is there any text similarity databse available for phrases?

I want to train my application for phrase similarity. I want my model to predict similarity score for phrases as shown in below examples. ex-

International Business Machines = I.B.M
Synergy Telecom = SynTel
Beam inc = Beam Incorporate
Sir J J Smith = Johnson Smith
Alex, Julia = J Alex
James B. D. Joshi = James Joshi
James Beaty, Jr. = Beaty

Is there any dataset available to train this type of model?

Topic deep-learning nlp machine-learning

Category Data Science


This seems to correspond to entity linking or possibly named entity coreference. You might find some datasets here.


This is a difficult problem, but definitely worth exploring.

An interesting resource to look into is DBpedia. It aims to extract structured information from the Wikipedia project. It is available under a free license (CC-BY-SA).

You can conveniently explore the project online, e.g.:

Note that you are restricted to the extensive but ending knowledge on Wikipedia, for example Synergy Telecom/SynTel seems not to have an entry. Your creativity would be required to overcome this limitation.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.