Special tokens for encoder and decoder in the transformer architecture

I am trying to wrap my head around the different special tokens that the different transformer architectures use.

For example, let's say we have the following input and target both for a text generation example and for a text classification example:

  • Input: My cat is black
  • Target_generation: He is a good cat
  • Target_classification: Positive

Now, for the text classification, using something like BERT, I know I have to do the following:

  • Encoder input: [CLS, My, cat, is, black]
  • Pool the encoder output of the CLS token. BERT has no decoder.
  • Target: [Positive]

And it is done. So the CLS token can be used (although it is not recommended by the authors) as a sentence embedding for classification.

My questions begin when it comes to text generation, or text translation. Since, from what I understand, both can be done equally as good with an encoder-decoder transformer, I'll go to the doubts directly.

  1. Which of the following is the input to the encoder? Does it need special BOS and EOS tokens? From what I understand, it doesn't get any valuable information from them, since they will appear in absolutely all data points in the training... but I am not sure.

    • Input encoder = [My, cat, is, black]
    • Input encoder = [BOS, My, cat, is, black]
    • Input encoder = [BOS, My, cat, is, black, EOS]
  2. The input to the decoder, which would be the target sentence shifted right, is it correct to assume that for this case it would be [BOS, He, is, a, good, cat]? Or does it require an EOS to the end [BOS, He, is, a, good, cat, EOS]

  3. In the same manner, the target for training, it has to have the same length as the shifted target, right? So, if the shifted target is [BOS, He, is, a, good, cat], would the target be [He, is, a, good, cat, EOS]?

And, as a bonus question (although less important to me than the one before), what would be the exact purpose of the SEP token in BERT?

Thank you very much.

Topic encoder bert transformer

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.