Vector representation of documents for text classification

I'm looking for proper method of document embeddings. I know that doc2vec will give me the vector representations for given corpus, but how do I embed new documents? I need to train neural network that will classify text, but I have no idea how new documents should be embedded properly.

Topic doc2vec word-embeddings nlp machine-learning

Category Data Science


Not necessarily. Documents embeddings are constructions from (sub)word embeddings. Nowadays, you should always start from a pre-trained model. Please check tensorflow Hub or Hugging Face. Even without retraining, as long as your text is not too specific, the pre-trained encoder should capture enough relations.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.