Is the number of bidirectional LSTMs in encoder-decoder model equal to the maximum length of input text/characters?

Question

Is the number of bidirectional LSTMs in encoder-decoder model equal to the maximum length of input text/characters?

Joe Black

2022年4月26日 07:03

I'm confused about this aspect of RNNs while trying to learn how seq2seq encoder-decoder works at https://machinelearningmastery.com/configure-encoder-decoder-model-neural-machine-translation/.

It seems to me that the number of LSTMs in the encoder would have to be the same as number of words in the text (if word embeddings are being used) or characters in the text (if char embeddings are being used). For char embeddings, each embedding would correspond to 1 LSTM in 1 direction and 1 encoder hidden state.

Is this understanding correct?

E.g. If we have another model that uses encoder-decoder for a different application (say text-to-speech synthesis described here https://ai.googleblog.com/2017/12/tacotron-2-generating-human-like-speech.html) tha uses 256 LSTMs in each direction of the bidirectional-encoder, does it mean the input to this encoder is limited to 256 characters of text?

Can the decoder output has to be same length as the encoder input or can it be different? If different what factor describes what the decoder output length should be?

Topic attention-mechanism lstm rnn word-embeddings nlp

Category Data Science

Allohvk · Accepted Answer · 2020年11月21日 14:00

There is a slight problem with the semantics that you have used.

RNN is a 'recurring' neural net. LSTM is a type of RNN. So when you say "number of LSTMs in the encoder would have to be the same as number of words in the text", I suppose what you mean to say is that the number of time-steps or number of NN's in the (single) LSTM layer is the same as the number of words in the sentence. If this is what you actually meant, then you are correct. For e.g. If you have a 20 word tweet and a set of 10000 such tweets. You can use a single LSTM layer with num of timesteps=20 to process this. You do NOT use 20 LSTM layers.

This should answer your second question also. The decoder too does not have 20 LSTMs. It has probably 1 LSTM layer with 20 time-steps.

The decoder output length need not match exactly with the encoder output length. In other words it need not be a word for word translation. The models typically use BOS and EOS markers to start and stop processing.

Is the number of bidirectional LSTMs in encoder-decoder model equal to the maximum length of input text/characters?

About