Spacy v2.0.1 custom NER: How to improve training of existing model
I implemented custom NER with bellow trained data first time and it gives me good prediction with Name and PrdName. I mentioned code bellow.
if __name__ == '__main__':
TRAIN_DATA = [
('My Name is Rajesh', {'entities': [(11, 17, 'Name')]}),
('My Name is Bakul', {'entities': [(11, 16, 'Name')]}),
('My Name is Pritam', {'entities': [(11, 17, 'Name')]}),
('My Name is Rakesh', {'entities': [(11, 17, 'Name')]}),
('My Name is Jayeeta', {'entities': [(11, 18, 'Name')]}),
('this is the price of bag', {'entities': [(21, 24, 'PrdName')]}),
('what is the price of ball?', {'entities': [(21, 25, 'PrdName')]}),
('what is the price of jegging?', {'entities': [(21, 28, 'PrdName')]}),
('what is the price of t-shirt?', {'entities': [(21, 28, 'PrdName')]}),
]
iterations = 20
try:
model = 'live_ner_model'
nlp = spacy.load(model) # load existing spacy model
except:
model = None
print("Exception")
nlp = spacy.blank('en') # create blank Language class
print("Created blank 'en' model")
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner)
print("Create NER")
else:
ner = nlp.get_pipe('ner')
print("Exhisting NER")
# Add new entity labels to entity recognizer
for _, annotations in TRAIN_DATA:
for ent in annotations.get('entities'):
ner.add_label(ent[2])
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes): # only train NER
optimizer = nlp.begin_training()
for itn in range(iterations):
print("Statring iteration " + str(itn))
random.shuffle(TRAIN_DATA)
losses = {}
for text, annotations in TRAIN_DATA:
nlp.update(
[text], # batch of texts
[annotations], # batch of annotations
drop=0.2, # dropout - make it harder to memorise data
sgd=optimizer, # callable to update weights
losses=losses)
print(losses)
# Save model
output_dir = 'live_ner_model'
if output_dir is not None:
output_dir = Path(output_dir)
if not output_dir.exists():
output_dir.mkdir()
nlp.meta['name'] = model # rename model
nlp.to_disk(output_dir)
print("Saved model to", output_dir)
# Test the saved model
output_dir = 'live_ner_model'
print("Loading from", output_dir)
nlp2 = spacy.load('live_ner_model')
test_text = """
what is the price of cup. My Name is Rahim
"""
doc2 = nlp2(test_text)
for ent in doc2.ents:
print(ent.label_, ent.text)
But when I am trying to trained with some new data which has entity with only PrdName or any other new entity excluding Name in existing model.
Then Name entity prediction goes wrong. I think this issue arises as I updated trained data excluding Name
entity.
So is there any way we can improve training by not affecting existing training. Can someone share the idea? If possible please share a sample code.
Environment: Anaconda, spacy=v2.0.1, python=3.7
Topic spacy python-3.x anaconda
Category Data Science