What the differences between self-supervised/semi-supervised in NLP?

Question

What the differences between self-supervised/semi-supervised in NLP?

Inhyeok Yoo

2021年5月27日 08:20

GPT-1 mentions both Semi-supervised learning and Unsupervised pre-training but it seems like the same to me. Moreoever, Semi-supervised Sequence Learning of Dai and Le also more like self-supervised learning. So what the key differences between them?

Topic pretraining semi-supervised-learning nlp

Category Data Science

Kasra Manshaei · Accepted Answer · 2021年5月27日 08:20

Semi-supervised learning is having label for a fraction of data, but in self-supervised there is no label available. Imagine a huge question/answer dataset. No one labels that data but you can learn question answering right? Because you are able to retrieve relation between question and answer from data.

Or in modeling documents you need sentences which are similar and sentences which are dissimilar in order to learn document embedding but these detailed labels are usually not available. In this case you count sentences from same document as similar and sentences from two different documents as dissimilar and train your model (example idea: you can run a topic modeling on data and make similar/dissimilar labels more accurate). It is called self training.

What the differences between self-supervised/semi-supervised in NLP?

About