Data store for testing data products?

Is there a recommended approach for storing processed data for testing new data products?

Basically, I'd like to have a system where a data scientist or an analyst could think of a new data product to present to users, do the data processing to create it, and then put it in a data store that our application can then access easily.

What I'm not sure about is what kind of data store would be good for this type of "testing" use case. Since it would need to be flexible enough to handle different types of data products, like aggregates, windowed data, etc. And ideally it wouldn't require a huge instrumentation process to try out new things.

Topic sql nosql

Category Data Science


You might try Azure Table Storage. Since you can't lock yourself down to a specific schema (since one data product might be aggregates whereas another might be time series or something else), Azure Table storage would give you the flexibility of storing data from multiple sources, each having their own format.

This would also lend itself to making a system highly scalable, as you could use Azure Service Bus in conjunction with Azure Table Storage.

You might check out this tutorial at Pluralsight, Applied Windows Azure, as it shows a number of examples, one using Table Storage and Service Bus, another using Hadoop, and I suspect that some of these might match the extensibility you are looking for.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.