Looking for a generalized (extended) lemmatizer

Whenever I lemmatize a compound word in English or German, I obtain a result that ignores the compound structure, e.g. for 'sidekicks' the NLTK WordNet lemmatizer returns 'sidekick', for 'Eisenbahnfahrer' the result of the NLTK German Snowball lemmatizer is 'eisenbahnfahr'. What I need, however, is something that would extract the primary components out of compound words: ['side', 'kick'] and, especially, ['eisen', 'bahn', 'fahr'] (or 'fahren' or in whatever form for the last item). I am especially interested in segmentizing compound words for German.

I failed to find anything of the kind. This kind of an NLP pipe would probably not be called a lemmatizer (or would it?) Is there a definition for it?

Topic nltk nlp

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.