Spatially constrained geospatial similarity

What's the current methodology for clustering geospatial data by features?

Example: I have some demographic dataset. Let's say this contains average home price and population density.

So, an example correlation here would be home price vs population density. But, the trick is how the clustering gets pulled. For example, an affluent area with high population density isn't the same as one with low population density. Applying a basic distance metric wouldn't take this into account since low vs highs could offset each other giving similar distances. This leads me to possibly some form of weighted clustering to pull centroids.

Not sure what methodology takes this into account.

Topic data-analysis regression geospatial scikit-learn pandas

Category Data Science


I assume you are trying to find a suitable distance metric based on features of different areas (although spatial distances might also easily be plugged in). In that case, I would first try to make sure the different features are correctly scaled, for example, to zero mean and unit variance.

If the result does not seem right, I would also try looking at different distance metrics. A simple alternative example is the L1 norm:

L1(a, b) = sum_x |x_a - x_b|

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.