SAGAN - what is the correct architecture?
Hi, in the original paper the following scheme of the self-attention appears:
https://arxiv.org/pdf/1805.08318.pdf
In a later overview: https://arxiv.org/pdf/1906.01529.pdf
this scheme appears:
referring the original paper.
My understanding more correlates with the second paper scheme, as:
Where there is two dot-product operations and three hidden parametric matrices:
$$W_k, W_v, W_q$$
which corresponds to $W_f, W_g, W_h$ without $W_v$ as it in the original paper explanation, which is as following:
Is this a mistake in the original paper ?
Topic adversarial-ml attention-mechanism gan deep-learning
Category Data Science