Estimating location in a model

I have a big dataset with 10 columns and about a 100,000 rows. Each 5 rows represent a person being tracked and the data related to this tracking such as time, velocity, etc. the last two columns are the longitude and latitude for that person.

To test the model, the test set has the fifth row for each person missing in longitude and latitude. What's the best way to approach this problem?

for example the test set looks like:

id   time    feature2  feature3  long    lat
1      x          x        x     number  number
1      x          x        x     number  number
1      x          x        x     number  number
1      x          x        x     number  number
1      x          x        x     
2      x          x        x     number  number
2      x          x        x     number  number
2      x          x        x     number  number
2      x          x        x     number  number
2      x          x        x     

etc

Topic machine-learning-model predictive-modeling algorithms machine-learning

Category Data Science


One option would be to cluster the longitude and latitude. Point estimates based on longitude and latitude would be wrong much of the time. Clustering would lower the precision of the data to increase the chance of the model being approximately correct.

Longitude and latitude can be clustering using spatially-aware indexing such as H3. Spatially-aware indexing allows for different size bins.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.