how to train custom word2vec embeddings to find related articles?

Question

how to train custom word2vec embeddings to find related articles?

Balu

2022年6月2日 19:02

I am beginner in machine learning. My project is to make search engine based on AI which shows related articles when we search on website. For this i decided to train my own embedding.

I found two methods for this:

One is to train network to find next word( i.e inputs=[the quick,the quick brown,the quick brown fox] and outputs=[brown, fox,lazy]
Other method is to train with nearest words(i.e [brown,fox],[brown,quick],[brown,quick]).

Which method should i use and after training how should i convert the sentence to a single vector to apply cosine similarity means sentence- the quick brown fox will return 4 vectors how should i convert it to feed for cosine similarity(which takes only one vector) with another sentence.

Topic embeddings word-embeddings nlp

Category Data Science

20roso · Accepted Answer · 2020年8月5日 11:15

I find your question bit convoluted, so I will answer with the following bullet points:

Train your own word embeddings: There are many implementations out there, gensim is one.
Find related articles: On that point, without being an expert, I would suggest to do some research on Topic Modelling. There are also a lot of libraries you can use.
Word embeddings to sentence embeddings: This process is not as straightforward, the semantics change just by adding words together. You can use Word Mover's Distance or numerous other which train in a supervised way sentence embeddings or unsupervised.

how to train custom word2vec embeddings to find related articles?

About