A way to init sentence embedding for unsupervised text clustering, better than glove wordvec?

For unsupervised text clustering, the key thing is the init embedding for text.

If we want to use deepcluster for text, the problem for text is how to get the init embedding from deep model.

BERT can not get good init embedding.

If we do not use deep model, is there better way to get embedding better than glove wordvec?

Topic representation embeddings word-embeddings deep-learning clustering

Category Data Science


Generally, combining word vectors in a single sentence / document representation does not work extremely well, although the average embedding has been used in fastText and pooling in this paper.

You can also use autoencoders to try and predict the word distribution, similar to a bag-of-words approach, like here.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.