Using the Bahdanau attention layer on Tensorflow for time series prediction, although conceptually it is similar to NLP applications. This is how the minimal example code for a single layer looks like. import tensorflow as tf dim=7 Tq=5 # Number of future time steps to predict Tv=13 # Number of historic lag timesteps to consider batch_size=2**4 query=tf.random.uniform(shape=(batch_size, Tq, dim)) value=tf.random.uniform(shape=(batch_size, Tv, dim)) key=tf.random.uniform(shape=value.shape) layer=tf.keras.layers.AdditiveAttention(use_scale=True, causal=True) output, score=layer(inputs=[query, value, key], return_attention_scores=True) The score obtained in the last line seems to be …