How to cluster sentences based on company names from a post(s) containing several company names using similarity metric.

My corpus contains several posts having text for several companies i.e. each post contains information about several companies.

I want to cluster the information based on few company names that I can specify. Clustering should be based on some similarity matrix such as euclidean or cosine similarity.

Which algorithm to use based on company name that I can specify and which similarity method to use?

Topic text-mining nlp python

Category Data Science


One option is Anchored CorEx which performs clustering with anchor words. For your problem, the anchor words would be company names.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.