Attention transformation - matrices
Could somebody explain which matrix dimension should be found here - K? and if it is for example 3X3, should I use just 9?
Topic transformer matrix softmax deep-learning
Category Data Science
Could somebody explain which matrix dimension should be found here - K? and if it is for example 3X3, should I use just 9?
Topic transformer matrix softmax deep-learning
Category Data Science
In addition to Noe's answer, you could consider $d_k$ being the equivalent ofhidden_state
dimensionality as seen in recurrent layers e.g. argument units
in tf.keras.layers.LSTM
.
$d_k$ is the dimensionality of the query/key/value vectors. In your example, the length of those vectors is 3, so $d_k = 3$
Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.