document similarity when the document size less than 30 tokens?

Question

document similarity when the document size less than 30 tokens?

IndPythCoder

2020年1月9日 09:17

I was solving a problem to compare 3 million, 2018 documents against the 2019 documents. There are three text attributes to be compared from one item against the other. I used Latent Semantic Indexing (LSI) for one variable, containing about 5 word tokens, with reasonable performance.

What will be the minimum document size for LSI/LDA for a multivariate [Item Description (text) (10 tokens), Item specification (text) (5 tokens)] problem to compute document similarity?
I had used cosine similarity, String Distances to measure the closeness of match of 2018 Description and 2019 description. Are there any other statistical methods to evaluate the Model?

Topic lsi similarity

Category Data Science

document similarity when the document size less than 30 tokens?

About