Transfer learning between Language Model and classification
Following this fast.ai lecture, I am trying to understand the mechanism of Transfer Learning in NLP from a general Language Model (LM) to a classification problem.
What is exactly taken from the Language Model training? Is it just the word embeddings? Or is it also the weights of the LSTM cell? The architecture of the neural net should be quite different - where in a LM you would output a prediction after every sequence-step, in a classification problem you would only care about the output of the final sequence step.
(I would happy to know what is the general practice, and also if anyone knows how fast.ai does it)
Topic transfer-learning classification language-model nlp
Category Data Science