Why activation function is not needed during the runtime of an Word2Vec model

In Word2Vec trainable model, there are two different weight matrix. The matrix $W$ from input-to-hidden layer and the matrix $W'$ from hidden-to-output layer.

Referring to this article, I understand that the reason we have the matrix $W'$ is basically to compensate for the lack of activation function in the output layer. As activation function is not needed during runtime, there is no activation function in the output layer. But we need to update the input-to-hidden layer weight matrix $W$ through backpropagation to eventually reach to the word embedding most suitable for our usecase. So there is this weight matrix $W'$ in the output layer.

But my question is why activation function is not needed during the runtime? Can anyone please explain?

Topic activation-function word2vec

Category Data Science


I think a word2vec model is supposed to be a linear classifier. We want a model that can represent the relative meaning of words in an Euclidean, human-interpertable space. In that way, we can calculate distances between word vectors that are understandable and easy to interpret by us, humans.


From this StackOverflow Question

while no activation is explicitly formulated, we could consider it to be a linear classification function. It appears that the dependencies that the word2vec models try to model can be achieved with a linear relation between the input words.

Adding a non-linear activation function allows the neural network to map more complex functions, which could in turn lead to fit the input onto something more complex that doesn't retain the dependencies word2vec seeks.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.