Attention transformation - matrices

Question

Attention transformation - matrices

Adolf Miszka

2021年6月18日 10:31

Could somebody explain which matrix dimension should be found here - K? and if it is for example 3X3, should I use just 9?

Topic transformer matrix softmax deep-learning

Category Data Science

hH1sG0n3 · Accepted Answer · 2021年6月18日 10:31

1

hH1sG0n3 answered at 2021年6月18日 10:31

In addition to Noe's answer, you could consider $d_k$ being the equivalent ofhidden_state dimensionality as seen in recurrent layers e.g. argument units in tf.keras.layers.LSTM.

noe · Accepted Answer · 2021年6月18日 10:01

1

noe answered at 2021年6月18日 10:01

$d_k$ is the dimensionality of the query/key/value vectors. In your example, the length of those vectors is 3, so $d_k = 3$

Attention transformation - matrices

About