Create clusters based on specific keywords

I am working on raw text data. I am using clustering to put together common words in the documents. My requirement is to create clusters based on a specific list of words i.e I want to get a group of words that are typically found with the user-given list of words. Visually, the clusters should look like below. Typically, the clustering techniques are focused on creating segregated clusters while I need segregated clusters with some overlap. The image shows the view of the expected results. I have tried using k-means clustering, the Apriori algorithm, and PrefixSpan in Python. But my desired result is not achieved.

Any suggestion is appreciated.

Topic python-3.x association-rules nlp k-means clustering

Category Data Science


Methods for generating fuzzy (or soft) rather than crisp clusters may be applicable to your problem. One implementation for soft clustering in Python is a variant of DBSCAN called HDBSCAN. This link explains soft clustering using this algorithm. In soft clustering each point's similarity to each cluster can be measured. If a point is sufficiently similar to multiple clusters, then the clusters may overlap in the feature space.

There are also algorithms for explicitly generating overlapping cluster assignments from the literature such as:

  • In Overlapping Correlation Clustering (Bonchi 2013) each point may be assigned to multiple clusters based on pairwise similarities between points in the input set.
  • In Model-based Overlapping Clustering (Banerjee 2005) a generative approach is used to modeling the clusters.

For a survey of overlapping clustering approaches, see Overlapping Clustering: A review (Baadel 2016).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.