Clustering method that allows to choose clusters' size
I have a multilabel dataset showing an extreme case of imbalance. I was thinking of clustering the less populated classes into bigger clusters of size at least N.
My question: is there a clustering algorithm that allows one to merge together only the smaller, similar groups into clusters of size at least N? The idea is that the algorithm should ignore those labels that are already populated enough and focus on clustering together labels that are still underrepresented.
To give an example, suppose we have five groups with sizes $n_1=10, n_2=8, n_3=6, n_4=4, n_5=3$, and that our N (minimum cluster size) is $N=9$. Suppose also that clusters $c_1, c_2, c_3$ are all very similar. Then our algorithm should ignore $c_1$ as $s_1 \ge N$, and should group together $c_2$ and $c_3$ into $c_{2,3}$. And if $c_{2,3}, c_4, c_5$ are also very similar to each other, then the algorithm should combine $c_4, c_5$, ignoring the already big enough $c_{2,3}$.
Topic imbalance clustering
Category Data Science