is this a good case for NOSQL?

I'm currently facing a project that I could solve with a relational database in a relatively painful way. Having heard so much about NOSQL, I'm wondering if there is not a more appropriate way of tackling it:

Suppose we are tracking a group of animals in a forest (n ~ 500) and would like to keep a record of a set of observations (this is a fictional scenario).

We would like to store the following information in a database:

  • a unique identifier for each animal
  • a description of the animal with structured fields: Species, Genus, Family, ...
  • a free text field with additional information
  • each time-point at which it was detected close to a reference point
  • a picture of the animal
  • an indication whether two given animals are siblings

And:

  • there might be additional features appearing later as more data comes in

We would like to be able to execute the following types of queries:

  • return all the animals spotted between in a given time interval
  • return all the animals of a given Species or Family
  • perform a text search on the free text field

Which particular database system would you recommend ? Is there any tutorial / examples that I could use as a starting point ?

Topic nosql databases

Category Data Science


Three tables: animal, observation, and sibling. The observation has an animal_id column which links to the animal table, and the sibling table has animal_1_id and animal_2_id columns that indicates two animals are siblings for each row.

Even with 5000 animals and 100000 observations I don't think query time will be a problem for something like PostgreSQL for most reasonable queries (obviously you can construct unreasonable queries but you can do that in any system).

So I don't see how this is "relatively painful". Relative to what? The only complexity is the sibling table. In NOSQL you might store the full list of siblings in the record for each animal, but then when you add a sibling relationship you have to add it to both sibling's animal records. With the relational table approach I've outlined, it only exists once, but at the expense of having to test against both columns to find an animal's siblings.

I'd use PostgreSQL, and that gives you the option of using PostGIS if you have location data - this is a geospatial extension to PostgreSQL that lets you do spatial queries (point in polygon, points near a point etc) which might be something for you.

I really don't think the properties of NOSQL databases are a problem here for you - you aren't changing your schema every ten minutes, you probably do care that your database is ACID-compliant, and you don't need something web-scale.

http://www.mongodb-is-web-scale.com/ [warning: strong language]

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.