Why do Transformers need positional encodings?

Question

Why do Transformers need positional encodings?

Cole

2022年5月23日 20:03

At least in the first self-attention layer in the encoder, inputs have a correspondence with outputs, I have the following questions.

Isn't ordering already implicitly captured by the query vectors, which themselves are just transformations of the inputs?
What do the sinusoidal positional encodings capture that the ordering of the query vectors don't already do?
Am I perhaps mistaken in thinking that transformers take in the entire input at once?
How are words fed in?
If we feed in the entire sentence at once, shouldn't the ordering be preserved?

Topic transformer deep-learning neural-network nlp machine-learning

Category Data Science

Shrinidhi M · Accepted Answer · 2022年4月22日 13:15

Consider the input sentence - "I am good".

In RNNs, we feed the sentence to the network word by word. That is, first the word "I" is passed as input, next the word "am" is passed, and so on. We feed the sentence word by word so that our network understands the sentence completely.

But with the transformer network, we don't follow the recurrence mechanism. So, instead of feeding the sentence word by word, we feed all the words in the sentence parallel to the network. Feeding the words in parallel helps in decreasing the training time and also helps in learning the long-term dependency.

We feed the words parallel to the transformer, the word order (position of the words in the sentence) is important. So, we should give some information about the word order to the transformer so that it can understand the sentence.

If we pass the input matrix directly to the transformer, it cannot understand the word order. So, instead of feeding the input matrix directly to the transformer, we need to add some information indicating the word order (position of the word) so that our network can understand the meaning of the sentence. To do this, we introduce a technique called positional encoding. Positional encoding, as the name suggests, is an encoding indicating the position of the word in a sentence (word order).

Why do Transformers need positional encodings?

About