A way to init sentence embedding for unsupervised text clustering, better than glove wordvec?

Question

A way to init sentence embedding for unsupervised text clustering, better than glove wordvec?

DunkOnly

2022年5月14日 20:02

For unsupervised text clustering, the key thing is the init embedding for text.

If we want to use deepcluster for text, the problem for text is how to get the init embedding from deep model.

BERT can not get good init embedding.

If we do not use deep model, is there better way to get embedding better than glove wordvec?

Topic representation embeddings word-embeddings deep-learning clustering

Category Data Science

qmeeus · Accepted Answer · 2020年7月20日 08:20

Generally, combining word vectors in a single sentence / document representation does not work extremely well, although the average embedding has been used in fastText and pooling in this paper.

You can also use autoencoders to try and predict the word distribution, similar to a bag-of-words approach, like here.

A way to init sentence embedding for unsupervised text clustering, better than glove wordvec?

About