Vector representation of documents for text classification

Question

Vector representation of documents for text classification

Mikołaj Wróblewski

2020年12月24日 16:02

I'm looking for proper method of document embeddings. I know that doc2vec will give me the vector representations for given corpus, but how do I embed new documents? I need to train neural network that will classify text, but I have no idea how new documents should be embedded properly.

Topic doc2vec word-embeddings nlp machine-learning

Category Data Science

Benoit Descamps · Accepted Answer · 2020年11月24日 08:58

Not necessarily. Documents embeddings are constructions from (sub)word embeddings. Nowadays, you should always start from a pre-trained model. Please check tensorflow Hub or Hugging Face. Even without retraining, as long as your text is not too specific, the pre-trained encoder should capture enough relations.

Vector representation of documents for text classification

About