How to train a machine learning model for named entity recognition

Question

How to train a machine learning model for named entity recognition

Hing

2022年5月10日 09:46

I cannot find any sources about the architectures of machine learning models to solve for NER problems. I vaguely knows it is a multiclass classification problem, but how can we format our input to feed into such multiclass classifier? I know the inputs must be annotated corpus, but how can we feed that chunk of pairs of (word, entity label) into the classifier? Or, how do you feature-engineer such corpus to feed into ML models? Or, in general, how can you train a custom NER from scratch with machine learning?

TIA.

Topic named-entity-recognition nlp machine-learning

Category Data Science

Erwan · Accepted Answer · 2022年5月10日 09:46

There are actually many libraries for training NER models.

It's useful to know that this type of model/task is called sequence labeling because it consists in predicting a label for every word, taking into account the other words close to the target word.
The standard method is Conditional Random Fields (CRF). There are various libraries, see for example this answer.
Traditionally a specific format called BIO (sometimes IOB) which stands for Begin, Inside, Outside is used as input (see a very short example). The features can involve context words through custom patterns (see the documentation of the libraries for details).

How to train a machine learning model for named entity recognition

About