Why does Bahdanau Attention Have to be Causal?
Using the Bahdanau attention layer on Tensorflow for time series prediction, although conceptually it is similar to NLP applications. This is how the minimal example code for a single layer looks like. import tensorflow as tf dim=7 Tq=5 # Number of future time steps to predict Tv=13 # Number of historic lag timesteps to consider batch_size=2**4 query=tf.random.uniform(shape=(batch_size, Tq, dim)) value=tf.random.uniform(shape=(batch_size, Tv, dim)) key=tf.random.uniform(shape=value.shape) layer=tf.keras.layers.AdditiveAttention(use_scale=True, causal=True) output, score=layer(inputs=[query, value, key], return_attention_scores=True) The score obtained in the last line seems to be …
Category:
Data Science