Combine two sets of clusters

Question

Combine two sets of clusters

Volka

2017年8月18日 05:55

I have two sets of topics obtained from two different sets of news paper articles.

In other words, Cluster_1 = ${x_1, x_2, ..., x_n}$ includes the main topics of 'X' news paper set and Cluster_2 = ${y_1, y_2, ..., y_n}$ includes the main topics of 'Y' news paper set.

Now I want to find clusters in the two sets that are similar/related by considering the cluster attributes as given in the example below.

Example 1,
**X1 in Cluster_1** is mostly similar/related to **Y2 in Cluster_2**
**X2 in Cluster_1** is mostly similar/related to **Yn in cluster_2**
and so on.

Example 2:
News about Yet in Cluster_1 is mostly similar/related to News about Science in Cluster_2
News about Floods in Cluster_1 is mostly similar/related to News about Rains in Cluster_2

Since, I am dealing with two separate sets of clusters, what would be a suitable measurement/method I can use to connect the clusters in the two different sets?

Topic dirichlet unsupervised-learning topic-model data-mining machine-learning

Category Data Science

Thomas Cleberg · Accepted Answer · 2017年8月17日 03:56

To compare two LDA topics, you're really trying to compute the distance between two probability distributions.

One such measure that's commonly used in these circumstances is the Hellinger Distance. To find the closest match for $x_1$ in the topics for $y$, you would calulate the Hellinger Distance between $x_1$ and each $y$ topic, then take the lowest one.

Keep in mind that there's no guarantee whatsoever that the "most similar" topic in this sense would be remotely, subjectively similar.

Combine two sets of clusters

About