NN embedding layer

Several neural network libraries such as tensorflow and pytorch offer an Embedding layer. Having implemented word2vec in the past, I understand the reasoning behind wanting a lower dimensional representation.

However, it would seem the embedding layer is just a linear layer. All other things being equal, would an embedding layer not just learn the same weights as the equivalent linear layer? If so, then what are the advantages of using an embedding layer?

In the case of word2vec, the lower dimensional representation can be used for other tasks (the infamous king/queen example). However, if your embedding layer will never be used for alternative tasks, what would be its purpose?

Topic vector-space-models tensorflow word-embeddings deep-learning machine-learning

Category Data Science


At theoretical level, the embedding layer is a linear layer, there is not any difference at all. However, in practice, if you are building a deep learning software, you have to make a difference among them. This is because it does not make sense to apply an embedding layer using traditional matrix multiplication, as the input matrix is very sparse. For this reason, it is faster to do a look-up, although in terms of theory it is equivalent to doing a matrix multiplication.


The embedding layer maps your vocabulary index input to a dense vector, so it acts as lookup layer and (if set to trainable) will be influenced on some weights only, by the words occurring in a batch of training data. Having a linear layer, it would be sequentially trained by all the data batches and would not provide a lookup functionality (each word given to the input would share the same weights).

Also, you're right considering word2vec differently. When using a custom trainable embedding layer, the dense vectors will be optimized (by SGD) for the task you are considering whereas models like word2vec act like language modeling and find a semantic optimum representation in the embeddings.

Therefore, depending on data size, the representation found during training might be better for your task than the one found by a neutral word2vec or other model.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.