Why do Transformers need positional encodings?

At least in the first self-attention layer in the encoder, inputs have a correspondence with outputs, I have the following questions.

  • Isn't ordering already implicitly captured by the query vectors, which themselves are just transformations of the inputs?
  • What do the sinusoidal positional encodings capture that the ordering of the query vectors don't already do?
  • Am I perhaps mistaken in thinking that transformers take in the entire input at once?
  • How are words fed in?
  • If we feed in the entire sentence at once, shouldn't the ordering be preserved?

Topic transformer deep-learning neural-network nlp machine-learning

Category Data Science


Consider the input sentence - "I am good".

In RNNs, we feed the sentence to the network word by word. That is, first the word "I" is passed as input, next the word "am" is passed, and so on. We feed the sentence word by word so that our network understands the sentence completely.

But with the transformer network, we don't follow the recurrence mechanism. So, instead of feeding the sentence word by word, we feed all the words in the sentence parallel to the network. Feeding the words in parallel helps in decreasing the training time and also helps in learning the long-term dependency.

We feed the words parallel to the transformer, the word order (position of the words in the sentence) is important. So, we should give some information about the word order to the transformer so that it can understand the sentence.

If we pass the input matrix directly to the transformer, it cannot understand the word order. So, instead of feeding the input matrix directly to the transformer, we need to add some information indicating the word order (position of the word) so that our network can understand the meaning of the sentence. To do this, we introduce a technique called positional encoding. Positional encoding, as the name suggests, is an encoding indicating the position of the word in a sentence (word order).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.