Why does Keras only have 3 types of attention layers?

The Keras library list only has 3 types of attentions - keras attention layers, which are :

  1. MultiHeadAttention layer
  2. Attention layer
  3. AdditiveAttention layer

However, in theory there are multiple types of attentions possible, e.g. (some of these may be synonyms):

  • Global
  • Local
  • Hard
  • Bahdanau Attention
  • Luong Attention
  • self
  • additive
  • Latent
  • what else?

Are other types just not practical or other types actually can be derived from existing implementation? Can someone please shed some light with examples?

Topic attention-mechanism keras deep-learning

Category Data Science


You can always build your own custom attention layer using TF or Pytorch. Keras has three popular inbuilt Attention Layer now. Before we had to build our own custom layers even in Keras. The diffrence between Bahdanou and Loung is the way that the attention weights are calculated. One is additive and the other multiplicative(dot product) respectively. Both of these models the key and the value are same. For more advanced transformer techniques refer to BERT or DistilBERT. We also have various variations in BERT.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.