What Clustering Method Should I Use?

My data is a group of 10 thousand points (each having an node location (x,y)) that are spread across a plane. They are also chromatically-colored based on their weight.

I need to finalize a bayesian nonparametric clustering method that groups points on mainly weight, but also on distance: that is, clusters are by defintion have some relevance to distance, but there are clear topological distinguishing factors between the first quarter and the last quarter of data (I say quarter as an arbitrary amount; in reality, the exact number and topology of the cluster changes through iterations).

As you can see in this picture above, I’ve tried to use notability to create a crude chromatically-colored image of data with varying cluster topology types; over each iteration of my algorithm, the clusters, as mentioned, change location (based on their weight) and their shape, and some overlap (and the possiblility of new clusters (or a possible decrease in the total number) is very high per iteration, where this image represents one iteration of x points)

Additionally, since I am doing this analysis with data via python, i was thinking of using the T-SNE machine learning package as a substitute for a generic clustering method, but I only have a limited knowledge on its functionality. Also, since my data is based on the same weighted scale, it may be overkill.

EDIT: I changed the picture to show overlapping clusters, so it is clearer what I mean. However, keep in mind even these visible clusters are not homogenous in weight (they still vary but in a small threshold). Sure there is noise, but I really want to treat each cluster independently to see each cluster’s behavior over time (as well as clusters that are newly formed, hence the nonparametric method)

Topic bayesian-nonparametric tsne python clustering

Category Data Science


One option is spectral clustering. Spectral clustering can find the "connectedness" in data.


It looks like you already have decided that you need a Bayesian Non-Parametric approach, why not start then with the Dirichlet Process and then see if the results are satisfactory. You don't mention the reasons why you need to use a Bayesian nonparametric approach, so I am not sure of the entire background here.


I would look at "Fuzzy-C Clustering".

This type of clustering is "soft" in that it provides a likelihood of a given point in a given cluster based upon weights, etc.

Below are some links to get into the weeds a little...

Towards Data Science, Wikipedia and the Python docs.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.