NoSQL vs SQL backend for semi structured data

I have a corpus of job descriptions and another corpus of CVs of applicants. I plan to implement a matching system using machine learning algorithms, to find top 5 or top 10 applicants for each job description. Should I store the data in a document oriented NoSQL db (MongoDB) or stick to SQL.

Given that the data I have is semi-structured at best, I feel a NoSQL db will offer more flexibility. I would appreciate opinions on this.

Topic sql nosql databases

Category Data Science


I would use SQL and create a set of structured fields that are common across all applications (name, school, years of experience, job they are applying for, etc.) with a field containing the raw application for you 'semi-structured' part of the data. You can always do something fancy with the raw application field, but if you want to get some summary statistics quickly then SQL is the proper route.

In general, thinking carefully about your schema up front will pay big dividends in the future. I would recommend NoSQL only if you really have no idea how to structure things and are pressed for time. Even getting a list of the keys in a collection requires MapReduce.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.