Can we use doc2vec to detect outlier documents?

I have a set of documents and I want to identify and remove the outlier documents. I am just wondering if doc2vec can be used for this task.

Or are there any recently evolved, promising algorithms that I can use for this task?

EDIT

I am currently using a bag of words model to identify outliers.

Topic gensim word2vec outlier nlp data-mining

Category Data Science


One way to approach it:

  1. Define a center tendency of the documents, a location in vector space.

  2. Then, define a distance metric (e.g., cosine, Minkowski, or Mahalanobis).

  3. Lastly, set a threshold in the distance metric that would define an outlier.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.