For an LSTM-based seq2seq model, is reversing the input still necessary or advised when using attention?
The original seq2seq paper reversed the input sequence and cited multiple reasons for doing so. See: Why does LSTM performs better when the source target is reversed? (Seq2seq)
But when using attention, is there still any benefit to doing this? I imagine since the decoder has access to the encoder hidden states at each time step, it can learn what to attend to and the input can be fed in the original order.
Topic attention-mechanism sequence-to-sequence lstm machine-translation
Category Data Science