SAGAN - what is the correct architecture?

Hi, in the original paper the following scheme of the self-attention appears: https://arxiv.org/pdf/1805.08318.pdf

In a later overview: https://arxiv.org/pdf/1906.01529.pdf

this scheme appears: referring the original paper.

My understanding more correlates with the second paper scheme, as: Where there is two dot-product operations and three hidden parametric matrices: $$W_k, W_v, W_q$$ which corresponds to $W_f, W_g, W_h$ without $W_v$ as it in the original paper explanation, which is as following:

Is this a mistake in the original paper ?

Topic adversarial-ml attention-mechanism gan deep-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.