two different attention methods for seq2seq

Question

two different attention methods for seq2seq

DSKim

2022年5月22日 02:00

I see two different ways of applying attention in seq2seq:

(a) the context vector (the weighted sum of encoder hidden states) fed into the output softmax, as shown in the diagram below. The diagram is from here.

(b) the context vector fed into the decoder input as shown the diagram below. The diagram is from here.

What are the pros and the cons of the two different approaches? Is there any paper comparing the two?

Topic attention-mechanism sequence-to-sequence

Category Data Science

DSKim · Accepted Answer · 2020年4月17日 03:00

1

DSKim answered at 2020年4月17日 03:00

(a) is Luong's attention mechanism (link) while (b) is Bahdanau's mechanism (link)

two different attention methods for seq2seq

About