Detect named entities inside words

Some languages have word endings with their nouns (like Finnish, e.g. "in Berlin" -> "Berliinissä"). I have tried to annotate the characters in the training data as entities, but then I run the model, it doesn't detect the characters inside the word. When those characters are a separate word, only then they're detected. I am unable to think of an implementation to effectively detect named entities within a word. Any suggestions would be helpful.

Topic chatbot named-entity-recognition nlp

Category Data Science


I would recommend to look into character level named entity recognition. For example: Kuru et al, CharNER: Character-Level Named Entity Recognition, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (2016)

The authors evaluate on many highly inflected languages including Turkish, so this should be adequate for your Finnish use case

The code is here: https://github.com/ozanarkancan/char-ner

You should hopefully be able to download and get it running out of the box for training. Of course I am assuming you have a tagged NER corpus in Finnish, which you would need to preprocess to get into the same format as the CSV file that they use for Czech in the repo.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.