Data Engineering Stack - collect, transform and visualize geospatial data

I'm making a side project, where I collect geospatial data by web scrapping and from OSM API. I've started with simple Java application, however, I would like to make it as a data flow, purely for learning purposes.

Unfortunately, my knowledge about tools, and mostly connecting them, is, well, low.

What is my goal?
As a final result I want to visualize scrapped geospatial points on the map with the roads connecting them(from OSM).

Current flow:
In standalone Java application I'm scrapping the data for geospatial points. There is a client consuming the OSM API for needed data.

What I think it might be useful:
Use Apache Spark for collecting and transforming the data. Then use somehow GeoSpark, or Geotrellis, and Zeppelin to visualize the data. I was also thinking about using ES + Kibana for geodata, but it looks like the Zeppelin is enough.
I feel comfortable to work with Java, then Scala.

What do you think? Are there any better tools I can use? Did I miss anything?

Topic data-engineering geospatial visualization tools

Category Data Science


OpenStreeMap has the Overpass API to get this data. They provide a specific data model consisted of nodes, ways and relationships that you can translate to points and other geometries, and to the preferred data structure you are used to manipulating.

If you want to do this with the help of a python lib, I recently implemented the geohunter, which is a parser for this data model to geopandas' GeoDataFrame (the most commonly used spatial data structure in python nowadays). You can then export your GeoDataFrame for GeoJSON or shapefile and import in your java app with a simple call of to_file('points.geojson', driver='GeoJSON') or dump the gdf into geojson string with to_json().

You can also put the results into a mongodb, which has a pretty nice interface to geojson.

This is an example of how to get OSM data in GeoDataFrame using geohunter.

import geohunter
api = geohunter.osm.Eagle()

# Get the city df you want to analyze
city = api.get(bbox='(-8.02, -41.01, -3.0, -33.0)',
               largest_geom=True,
               name='Natal')

# Get some points from the map features available on OSM
poi = api.get(city,
              amenity=['school', 'hospital'],
              highway='primary',
              natural='*')

To know which map features (types of data) OSM has available, see their documentation.

The geohunter may have some bugs to parse some geometries. If it happens to you, open an issue and let's discuss it.


You can do this, waaay to easier to what you are currently doing.

For the data scraping, use whatever makes you happy. In my case I will use Uipath or just python, depending on the complexity. But this is up to you, you just want some dataset in a format that suits you.

Once you have your data, you want to visualize that. This is a classical data science task. I am from the python army so I will suggest going to python. If you are good with Java the transition will be like slicing hot butter.

There are some good libraries that will help you greatly. Here I suggest some packages that I have used in the past and they will help you greatly.

  1. Folium
  2. Geoplot
  3. Intro to folium (again)

In my case, I will probably stick to Foliumm since there is plenty of code to re-use on the internet and it is a piece of cake.

This is my personal opinion on what I will do. There might be other tools and languages...

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.