Attention weights - change during learning and prediction
Assume a simple LSTM Followed by Attention layer or a full transformer architecture. The attention weights are learnt during training, which get multiplied with keys, queries and values.
Please correct if my above understanding is wrong or below question.
The question is, when these weights of attention layer gets changed and when not.
- Do attention layer weights change for each input in sequence? (I assume no, but please confirm)
- Do attention layer weights get frozen during prediction (inference)? Or these keep on changing?
- In transformers or Bert, were these weights supplied as part of pretrained model?
Topic transformer attention-mechanism sequence-to-sequence
Category Data Science