Deep learning techniques for concept similarity?
Given a corpus of product descriptions (say, vacuum cleaners), I'm looking for a way to group the documents that are all of the same type (where a type can be cordless vacuums
, shampooer
, carpet cleaner
, industrial vacuum
, etc.).
The approach I'm exploring is to use NER. I'm labeling a set of these documents with tags such as (KIND, BRAND, MODEL
). The theory is that I'd then run new documents through the model, and the tokens corresponding to those tags would be extracted. I would then construct a feature vector for each document comprised of a boolean value for each of the tags. From there, a simple dot product would show all documents related to some base document (as in, these documents are all similar to this one document).
Question
What are other general approaches that might be a good fit for this task?
Topic similar-documents deep-learning nlp
Category Data Science