reducing false positives with annotated named entity recognition model
I am training a NER model to detect mentioned phrases and slang words in a bias study conducted on court cases.
Essentially, I have packets of text that I scanned and these are the complete proceedings.
The model is great at detecting the phrases I want based on annotations that I have created from the many cases that I have already scanned. However, I am facing false positives for certain phrases.
Here is an example of a phrase I want to tag: Your honor, my client, the def., pleads guilty.
Here is a false positive it has detected: You are def guilty, said the judge.
It seems that in many cases def gets tagged incorrectly. I have not fed the model any training documents where this type of shortened def could exist, but my guess is that is where my problem is. I have only trained the model on annotated data, and have not provided it any other data, text documents, readings.
What do you think I can do to reduce false positives?
Topic data-science-model named-entity-recognition machine-learning
Category Data Science