Software/Library Suggestion: Is there a usable open-source sequence tagger around?

(Not sure if this is the right community for the question - please do downvote if stats. or whatever else is more appropriate...)

I'm looking for a suggestion for either a command-line tool or library (preferably Python or Ruby, but at this point, anything will do) implementing non-Parts-of-Speech-specific sequence tagging/labelling. If it was PoS-specific but could be re-trained for custom categories, that'd be fine, too.

The projects I've found mostly seem to be abandoned PhD thesis codebases or similar and I've not been able to make any of them work in a practical manner. The one I got the furthest with was pytorch-sequence-tagger.

In case it helps with giving suggestions: the purpose is to tell apart tokens which are part of library class marks from tokens which are part of author names or book titles, but where the input data are too irregular for a rule-based system to work 100%.

Topic labelling nlp

Category Data Science


One can find sequence labelling libraries by searching for the term conditional random fields, the state of the art method. Probably one could also find libraries and tutorial by searching the term Named Entity Recognition, which is certainly the most standard NLP application of sequence labelling.

Here are a few libraries that I know of:

See also this question.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.