Traning New Entities in Spacy NER Model

I want to add new entities to python spacy NER module. I have few doubts regarding this.

  1. Is it possible to remove some of the presently existing entities and add new entities to the remaining ones.

  2. While training new entities, I found we have to provide training data in a particular format. For example,

    data = [ (I love chicken, [(8, 13, FOOD)]), ... ]

Instead of sentences like I love chicken, is it possible to give data like

data = [
    (chicken, [(1, 8, FOOD)]),
    ... 
]

Will this affect accuracy.

Topic spacy python

Category Data Science


This is not how NER works:

  • NER is not about recognizing only a fixed set of entities, it's about detecting any entity in a text. For example it should detect an entity in "Mr X said ..." whether X is "John Smith" or "Donald Duck".
  • This implies that NER uses clues from the sentence in order to detect entities, as opposed to just tagging entities that it knows from training.

Therefore

  1. Is it possible to remove some of the presently existing entities and add new entities to the remaining ones.

No, because the NER model is not a list of entities, it's a complex model using text features.

  1. Is it possible to give data like [...]

Technically you can give data like this but it's not the way it's supposed to work.

You seem to be interested in matching exactly a set of predefined entities, NER is not the right tool for this. In this scenario it's much simpler to store your entities in a list and apply string matching to the text.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.