What machine learning algorithms to use for unsupervised POS tagging?

Question

What machine learning algorithms to use for unsupervised POS tagging?

Tido

2021年9月23日 01:45

I am interested in an unsupervised approach to training a POS-tagger.

Labeling is very difficult and I would like to test a tagger for my specific domain (chats) where users typically write in lower cases etc. If it matters, the data is mostly in German.

I read about about old techniques like HMM, but maybe there are newer and better ways?

Topic unsupervised-learning parsing nlp machine-learning

Category Data Science

MkL · Accepted Answer · 2021年9月23日 01:45

1

MkL answered at 2021年9月23日 01:45

Very interested to hear what you need a tagger for in the context of chatbots?

Maybe you need just a stemmer - to produce 'base form' for an inflected word?

In that case, you can check this.

Zenquiorra · Accepted Answer · 2021年9月22日 13:30

There is no genuinely unsupervised method for POS tagging; we can think of it as, Parts of speech are inferred by us, with rules defined by the specific language being tagged. There is no mathematical "notion" for a part of speech that we can conclude given some text without any predefined rule established empirically (Which is why it is not genuinely unsupervised).

A weakly-supervised approach: Estimate the hidden state parameters of HMM using the Baum-Welch Algorithm.

And other is to implement a Maximum Entropy Model utilizing Beam Search, with rules established empirically(hence, not truly unsupervised)

Edward Weinert · Accepted Answer · 2019年5月12日 23:15

Fortunately, you don't need unsupervised methods for PoS tagging for most languages, especially for German. There are semi or "weakly" supervised methods like mentioned old HMM/EM approaches, however there is new and quite fresh solution with Error-Correcting Output-Code classification: Weakly supervised POS tagging without disambiguation.

Of course the accuracy of fully supervised methods like LSTM is far far better from semi supervised, but due to known issues of fully supervised methods (eg. lot of manual work) people still try to find lazy approaches. Excellent accuracy always cause higher costs.

Brian Spiering · Accepted Answer · 2018年7月18日 13:45

There are no unsupervised methods to train a POS-Tagger that have similar performance to human annotations or supervised methods.

The current state-of-the-art supervised methods for training POS-Tagger are Long short-term memory (LSTM) neural networks.

What machine learning algorithms to use for unsupervised POS tagging?

About