Best practice on count of manual annotations for building criminal detection from news articles?
We have 7 million news articles corpus, which we want to classify into crimes or non-crimes and further identify criminals by using NERs/annotating criminals, crime manually. For creating a model that identifies criminals, what is the number of annotated articles that we must train/build our model on? Is there any industry best practice on this count? Is there any better way to come to this number of training(annotated) dataset, than random guessing? Are there any best practices resources that anyone can point to? Thanks in advance!
Topic annotation data-science-model nlp machine-learning
Category Data Science