Conditional clustering

I have a dataset consisting of addresses (points) that have several attributes; one that distinguishes the sort of address and one attribute that contains a numerical value.

I want to cluster these points based on:

  1. their distance from each other
  2. the sort of address

However, the summed numerical attribute per cluster cannot exceed a certain threshold value.

In other words, the system needs to form clusters but needs to stop clustering as soon as the sum of the numerical value attached to each address has been reached.

How do I even go about it? I have R, Python, and another geo- applications at my disposal.

It seems that none of the existing clustering algorithms work. For k- means, for example, I need to know the number of clusters beforehand, which I don't.

It seems rather simple, but I can't find a basic methodology to follow.

Topic geospatial clustering

Category Data Science


Based on your comments, you are looking for agglomerative hierarchical clustering.

You start with one point as its own cluster. Then iterate over pairs of clusters, merging them according to some criterion.

Typically you need to select a "cut point" after which you stop combining clusters. This is not an easy problem in general, and for the most part involves eyeballing your data until it "looks right", much like choosing K in K-means. In your case, however, you can use the external criterion you have in mind. You will need to recompute its value at every step, and then simply stop when its value passes the desired threshold.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.