Update for anyone googling this in 2021: Keras has implemented a MultiHead attention layer. If key, query, and value are the same, this is self-attention.


Here is an implementation from PyPI.


One example from Kaggle is available.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.