Named Entity Recognition with BIO Tagging

I'm trying to implement NER using BIO annotation. For example

I went to the United States  
[O, O, O, B, I, I]

where B and I denote the beginning and 'I' the following of the entity.

However, when I use a vanilla BERT to do classification(whether it belongs it 'B', 'I', 'O') at each position of the sequence, I encounter cases where 'O' is followed by an 'I'. There are no cases in the data that exhibit ('O', 'I') pattern since there's always a 'B' or 'I' in front. Obviously, there's nothing to enforce the model to exclude such a pattern but I would like to somehow incorporate it into the model(like a transitioning probability from 'O' to 'I' being 0 or something).

I took a look at conditional random fields on top of BERT that is trying to do something very similar but the prediction somehow still contained these 'O' 'I' patterns.

Topic bert named-entity-recognition

Category Data Science


It may be interesting to look into the Viterbi Algorithm. When applied, you basically look at pairs of tokens instead of single tokens. You can then construct your transition matrix (i.e. conditional probability of token X preceding token Y) such that transitions from O to I tags are not allowed (probability of 0).

One thing to keep in mind is that a transition from an O to I tag without a B-tag essentially has less evidence and these postprocessing methods can result in higher false positive rates.


In my experience these cases are almost unavoidable in practice, but it's not a real problem:

  • The sequential model learns to classify each token as B,I or O based on the features. Occasionally it might find a case in the test data (sometimes even in the training data if it's noisy) where the most likely class based on the features is I even though the previous token is O. Normally it's rare since the model didn't see any O-I sequence in the training data.
  • It doesn't really matter: one can easily run a post-processing script which converts any O-I sequence into O-B if necessary.
  • Note that if you really don't want to obtain this kind of inconsistency you could opt for a different tagging scheme, for example only O,I. However in regular NE tasks the indicators which capture the start of an entity are important and might differ from the ones which characterize a continuing entity, so this would probably decrease performance. Note that other schemes exist, for example BILOU but since it's more complex it tends to cause more inconsistencies in general.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.