Pretrained vs. finetuned model

Question

Pretrained vs. finetuned model

lazarea

2022年5月17日 07:15

I have a doubt regarding terminology. When dealing with huggingface transformer models, I often read about using pretrained models for classification vs. fine-tuning a pretrained model for classification.

I fail to understand what the exact difference between these two is. As I understand, pretrained models by themselves cannot be used for classification, regression, or any relevant task, without attaching at least one more dense layer and one more output layer, and then training the model. In this case, we would keep all weights for the pretrained model, and only train the last couple of custom layers.

When task is about finetuning a model, how does it differ from the aforementioned case? Does finetuning also include reinitializing the weights for the pretrained model section, and retraining the entire model?

Topic pretraining transformer finetuning transfer-learning

Category Data Science

Nicolas Martin · Accepted Answer · 2022年5月17日 07:15

Even if both expressions are often considered the same in practice, it is crucial to draw a line between "reuse" and "fine-tune".

We reuse a model to keep some of its inner architecture or mechanism for a different application than the original one. For example, we can reuse a GPT2 model initialy based on english to adapt it to another language like chinese, which means deep changes from the initial model to the new one.

On the other hand, we fine tune a model to improve an already existing application or a slight different one, by changing specific hyperparameters or use better algorithms (for instance, using AdamW instead of Gradient Descent). There are plenty of methods in NLP to improve existing models, that's why we can consider it as a different area.

It could be regarded as a semantic issue, but I think it is interesting not to be confused between both expressions.

Pretrained vs. finetuned model

About