Database options for JSON storage, queried with Apache Drill

I am planning to set up a JSON storage system. It will store tens of millions of JSON records, all in the same format. I'd like to be able to query the data using Apache Drill.

It looks like there is Drill support for MongoDB and Postgres. However, I'm unsure of the pros and cons of each, and how I'd structure the schema if I'd choose Postgres.

Topic json mongodb bigdata

Category Data Science


Would be beneficial to know the structure of the data and what you are wanting to do with the data. Do you really need a database at all? Drill will read from a filesystem/S3/etc?

But to answer your question with the information you provided.

The pro of MongoDB over PostgreSQL would be in scaling and read performance (after a certain scale). If this is mainly a read only, or analytic database (primarily reads) then MongoDB would probably work.

PostgreSQL has the advantage of ACID compliance and more than likely better performance if you don't go above the ~10's of millions of records (especially with JSONB, which supports indexes on the datatype).

Without knowing much about the data, you could simply have two columns within your postgresql JSON table (id, JSON_field) and then use the functions within PostgreSQL to query your JSON field. It is a little more involved in querying JSON data within PostgreSQL but after you get used to it, it is pretty simple.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.