Understanding Transformer's Self attention calculations
I was going through this link: https://www.analyticsvidhya.com/blog/2019/06/understanding-transformers-nlp-state-of-the-art-models/?utm_source=blogutm_medium=demystifying-bert-groundbreaking-nlp-framework#comment-160771
What is the value of Key, Value in the self attention calculation of Transformer model?
Query vector is embedding vector for the word that is queried, is that right?
Is attention calculated in RNN is different from self attention in Transformer?
Topic transformer attention-mechanism
Category Data Science