Clustering of multi-label data

Question

Clustering of multi-label data

Ahron

2022年5月10日 11:02

The dataset consists of

1) a set of objects and

2) a set of labels, which are used to describe the objects.

For the moment, for simplicity sake, each label can be marked as either true or false (In a more complex setup, each label will have a value of 1-10).

But, not all the labels are actually applied to all the objects (in principle, all the labels can and should be applied across all the objects, but in practice, they just are not). Also, when a label isn't applied to an object, one cannot simply assume that the label's value for that particular is false. Therefore, the missing labels will be ignored in the model.

I need to cluster the objects based on their labels.

Any tips on how and what algorithms to use will be appreciated.

Topic labels multilabel-classification classification clustering

Category Data Science

Brian Spiering · Accepted Answer · 2021年3月27日 13:39

It is possible to cluster the objects based on their labels by treating the labels as features. Typically, labels are treated as targets which would frame the problem a supervised machine learning problem.

Since labels are nominal valued, you will need to use an appropriate distance metric. Jaccard index is one option.

Clustering of multi-label data

About