Special tokens for encoder and decoder in the transformer architecture

Question

Special tokens for encoder and decoder in the transformer architecture

Daniel V.

2022年4月1日 15:09

I am trying to wrap my head around the different special tokens that the different transformer architectures use.

For example, let's say we have the following input and target both for a text generation example and for a text classification example:

Input: My cat is black
Target_generation: He is a good cat
Target_classification: Positive

Now, for the text classification, using something like BERT, I know I have to do the following:

Encoder input: [CLS, My, cat, is, black]
Pool the encoder output of the CLS token. BERT has no decoder.
Target: [Positive]

And it is done. So the CLS token can be used (although it is not recommended by the authors) as a sentence embedding for classification.

My questions begin when it comes to text generation, or text translation. Since, from what I understand, both can be done equally as good with an encoder-decoder transformer, I'll go to the doubts directly.

Which of the following is the input to the encoder? Does it need special BOS and EOS tokens? From what I understand, it doesn't get any valuable information from them, since they will appear in absolutely all data points in the training... but I am not sure.
- Input encoder = [My, cat, is, black]
- Input encoder = [BOS, My, cat, is, black]
- Input encoder = [BOS, My, cat, is, black, EOS]
The input to the decoder, which would be the target sentence shifted right, is it correct to assume that for this case it would be [BOS, He, is, a, good, cat]? Or does it require an EOS to the end [BOS, He, is, a, good, cat, EOS]
In the same manner, the target for training, it has to have the same length as the shifted target, right? So, if the shifted target is [BOS, He, is, a, good, cat], would the target be [He, is, a, good, cat, EOS]?

And, as a bonus question (although less important to me than the one before), what would be the exact purpose of the SEP token in BERT?

Thank you very much.

Topic encoder bert transformer

Category Data Science

Special tokens for encoder and decoder in the transformer architecture

About