Is a dense layer required for implementing Bahdanau attention?
I saw that everyone adds Dense( ) layer in their custom Bahdanau attention layer, which I think isn't needed.
This is an image from a tutorial here. Here, we are just multiplying 2 vectors and then doing several operations on these vectors only. So what is the need of Dense( ) layer. Is the tutorial on 'how does attention work' wrong?
Topic attention-mechanism deep-learning machine-learning
Category Data Science