Attention transformation - matrices

Could somebody explain which matrix dimension should be found here - K? and if it is for example 3X3, should I use just 9?

Topic transformer matrix softmax deep-learning

Category Data Science


In addition to Noe's answer, you could consider $d_k$ being the equivalent ofhidden_state dimensionality as seen in recurrent layers e.g. argument units in tf.keras.layers.LSTM.


$d_k$ is the dimensionality of the query/key/value vectors. In your example, the length of those vectors is 3, so $d_k = 3$

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.