Transfer learning between Language Model and classification

Following this fast.ai lecture, I am trying to understand the mechanism of Transfer Learning in NLP from a general Language Model (LM) to a classification problem.

What is exactly taken from the Language Model training? Is it just the word embeddings? Or is it also the weights of the LSTM cell? The architecture of the neural net should be quite different - where in a LM you would output a prediction after every sequence-step, in a classification problem you would only care about the output of the final sequence step.

(I would happy to know what is the general practice, and also if anyone knows how fast.ai does it)

Topic transfer-learning classification language-model nlp

Category Data Science


Ok, so it seems that the whole idea of transfer learning in NLP is to use more than just the word-embeddings which is considered "low level", but rather higher level representations. This is akin to what goes on in Computer Vision where the final (or almost final) layer embedding in nets trained on ImageNet are then used in transfer learning in other tasks, or as the authors dubbed it: "a counterpart of ImageNet for NLP". This is similar to what GPT is doing.

You can find more info also in the accompanying article one of the authors wrote.

Specifically, from what I gather after reading the paper, is that they use a 3-layer LSTM with dropout for the Language Model part. Then they fine tune it with the relevant text (using a bunch of tricks specified in the paper/lecture). Then they use the same architecture weights, but augment the model with 2 additional linear blocks. The input for the final blocks are: the final input, the final hidden state, maximum and averaged hidden state of all previous hidden states of the regular. This model is also fine-tuned on the supervised task (classification) using "gradual unfreezing".

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.