How to compute sentence embedding from word2vec model?

I am new to NLP and I'm trying to perform embedding for a clustering problem. I have created the word2vec model using Python's gensim library, but I am wondering the following:

The word2vec model embeds the words to vectors of size vector_size. However, in further steps of the clustering approach, I realised I was clustering based on single words instead of the sentences I had in my dataset at the beginning.

Let's say my vocabulary is composed of the two words foo and bar, mapped as follows:

foo: [0.0045, -0.0593, 0.0045]
bar: [-0.943, 0.05311, 0.5839]

If I have a sentence bar foo, how can I embed it? I mean, how can I get the vector of the entire sentence as a whole?

Thanks in advance.

Topic word2vec word-embeddings nlp python

Category Data Science


The usual approach is to average the vectors of all words in the sentence.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.