Is there any text similarity databse available for phrases?

Question

Is there any text similarity databse available for phrases?

Mohit Saini

2022年5月25日 09:01

I want to train my application for phrase similarity. I want my model to predict similarity score for phrases as shown in below examples. ex-

International Business Machines = I.B.M
Synergy Telecom = SynTel
Beam inc = Beam Incorporate
Sir J J Smith = Johnson Smith
Alex, Julia = J Alex
James B. D. Joshi = James Joshi
James Beaty, Jr. = Beaty

Is there any dataset available to train this type of model?

Topic deep-learning nlp machine-learning

Category Data Science

Erwan · Accepted Answer · 2020年8月10日 15:55

1

Erwan answered at 2020年8月10日 15:55

This seems to correspond to entity linking or possibly named entity coreference. You might find some datasets here.

Simon · Accepted Answer · 2020年8月10日 15:24

This is a difficult problem, but definitely worth exploring.

An interesting resource to look into is DBpedia. It aims to extract structured information from the Wikipedia project. It is available under a free license (CC-BY-SA).

You can conveniently explore the project online, e.g.:

Note that you are restricted to the extensive but ending knowledge on Wikipedia, for example Synergy Telecom/SynTel seems not to have an entry. Your creativity would be required to overcome this limitation.

Is there any text similarity databse available for phrases?

About