Combine two sets of clusters

I have two sets of topics obtained from two different sets of news paper articles.

In other words, Cluster_1 = ${x_1, x_2, ..., x_n}$ includes the main topics of 'X' news paper set and Cluster_2 = ${y_1, y_2, ..., y_n}$ includes the main topics of 'Y' news paper set.

Now I want to find clusters in the two sets that are similar/related by considering the cluster attributes as given in the example below.

Example 1,
**X1 in Cluster_1** is mostly similar/related to **Y2 in Cluster_2**
**X2 in Cluster_1** is mostly similar/related to **Yn in cluster_2**
and so on.

Example 2:
News about Yet in Cluster_1 is mostly similar/related to News about Science in Cluster_2
News about Floods in Cluster_1 is mostly similar/related to News about Rains in Cluster_2

Since, I am dealing with two separate sets of clusters, what would be a suitable measurement/method I can use to connect the clusters in the two different sets?

Topic dirichlet unsupervised-learning topic-model data-mining machine-learning

Category Data Science


To compare two LDA topics, you're really trying to compute the distance between two probability distributions.

One such measure that's commonly used in these circumstances is the Hellinger Distance. To find the closest match for $x_1$ in the topics for $y$, you would calulate the Hellinger Distance between $x_1$ and each $y$ topic, then take the lowest one.

Keep in mind that there's no guarantee whatsoever that the "most similar" topic in this sense would be remotely, subjectively similar.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.