How to deal with annotation errors?

I know my annotators are not perfect, sometimes making mistakes. What would be the best way to deal with the annotation errors for my training data?

Topic annotation data-science-model training

Category Data Science


It's very common to have some amount of errors or inconsistencies in a dataset. Sometimes these inconsistencies are not even errors, in some subjective tasks (e.g. translation), annotators may simply not agree on what is the best answer.

What to do with this kind of noise completely depends on the case at hand. If the noise caused by these errors represents a reasonably small proportion of the data, it can safely be ignored: in this case it's up to the learning algorithm to distinguish the relevant patterns from the noise. Otherwise there can be ad-hoc pre-processing implemented to clean up the data. In cases where the subjectivity of the annotator plays an important role, it's useful to have several annotators annotated the same data and check the inter-annotator agreement. This might in turn be used to filter out the least consensual instances, or aggregate the annotations in some way (e.g. majority voting).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.