Probabilistic Machine Learning model to match spatial data

Question

Probabilistic Machine Learning model to match spatial data

ajroot

2022年4月9日 02:03

I have spatial data from multiple sources. This data consists of ID, lat, long, and time.

My goal is that given a new lat-long, the model needs to return (preferably with a probability) the data points that match the new lat-long. This matching should be based on the features (such as lat, long, timestamp).

I could only think of clustering. ie. Cluster the dataset and try to predict which cluster the new data belongs to. The drawback is that if the cluster has a lot of points then its hard to accurately pin point to which point in the cluster matches the closest to the new point.

Is there any other ways to do this? Any probabilistic model (HMM?).

Topic probability model-selection geospatial

Category Data Science

Attack68 · Accepted Answer · 2019年3月11日 18:25

So you have an existing dataset; $$\mathbf{X} = \{ [id_i, latitude_i, longitude_i, time_i] : i \in \{1,...,n\} \} $$ And you receive a new sample; $$ \mathbf{x}_* = [latitude, longitude, now] $$ And you want to determine a probability of which datapoints in $\mathbf{X}$ match $\mathbf{x_*}$?

For fear of pointing out the obvious why not just use K-nearest-neighbours, where $K=1$. Obviously you need to establish a distance metric that accounts for time, or at least equate time to longitude and latitude difference, or you could just ignore time altogether.

Distance Metric

The notion of nearest is obviously associated with some measure of distance. Considering a map one can easily associate a point on a map with a coordinate system and the notion of distance is intuitive. Given a sample data point you can just find the nearest datapoint to it on the map.

But what if you have another dimension that is not intuitively obvious. This is often specific to the problem. Suppose you were searching for two criminals in a city, and criminal Adam was last seen at coordinates (0,0) 1 week ago, and criminal Brian was last seen at coordinates (1,1) 1 day ago. Now you have a new sighting at (0.25, 0.25). This is closer in geography to (0,0) than (1,1) but Brian was more recently seen so perhaps this is the more likely lead to allocate resources for a search?

When using K-nearest neighbors you might need some kind of transformation function to convert distances to probabilities. There may be numerous choices, google softmax as one possible starting point.

Probabilistic Machine Learning model to match spatial data

Distance Metric

About