Best way to store large amounts of time series data? Relational database (SQL) or NoSQL route. Also additional python/pandas question inside

so I'm working on a project and I'm sort of stuck as to how to store my data. I have a concept I want to propose but am unsure whether it is possible, if it is not I would appreciate any help pointing me in the right direction. So as I said im working on a small project, for this project I want to store 2, 2 dimensional arrays every 20 seconds or so and have the time (seconds from …
Category: Data Science

How to learn noSQL databases and how to know when SQL or noSQL is better

I want learn about NoSQL and when is better to use SQL or NoSQL. I know that this question depends on the case, but I'm asking for a good documentation on NoSQL, and some explanation of when is better to use SQL or NoSQL (use cases, etc). Also, your opinions on NoSQL databases, and any recommendations for learning about this topic are welcome.
Topic: nosql
Category: Data Science

Is there an overview over recommender system architectures?

I want to learn more about the recommender system topic. I am very interested in the usage of different database systems for this use case. My problem is that I cannot find a good overview of different architectures of recommender systems, especially with the focus on the database part. Can someone help me out with a good reference or some own thoughts? Thanks a lot. As interesting as this topic is for me as hard it seems to get some …
Category: Data Science

Are there decisive leaders in programming with tabular data?

What are the most effective bread-and-butter in-memory open source tabular data frameworks today? I have been working with tabular data for years with an in-house solution that integrates with Excel well, but falls short of many other expectations. I would like to (if possible/true) demonstrate that our solution has fallen behind the times. In other words, assuming an SQL-like platform is responsible for persistence of a data set, but cycle intensive calculations need to be performed on that dataset (E.g. …
Category: Data Science

Deal with huge amount of data

I'm writing to get advices about my project. I want to make recommander system for shop with some products. In fact i want to recommand to shop A to take item X because shop B sell this item and shops A and B are very similar. The "problem" here is the size of the data : i have around 5TB of raw data (about 8 000 000 000 lines) So it's very difficult to do something with huge data like …
Category: Data Science

The data in our relational DBMS is getting big, is it the time to move to NoSQL?

We created a social network application for eLearning purposes. It's an experimental project that we are researching on in our lab. It has been used in some case studies for a while and the data in our relational DBMS (SQL Server 2008) is getting big. It's a few gigabytes now and the tables are highly connected to each other. The performance is still fine, but when should we consider other options? Is it the matter of performance?
Category: Data Science

Running SQL-like queries over large schemaless JSON dataset in the cloud?

I've got about 5 million JSON files, about 50GB in total. They do not have a consistent schema (they're broadly the same format, but some have extra extension fields, some have missing fields, etc - the schema is quite complexly nested). I would like to run SQL-like queries across these files - e.g. finding the count of files with a certain property, finding the count of files where property is in a numeric or time range, etc. I have the …
Category: Data Science

Seeking advice on database architecture -- given my problem, what tools should I learn?

I'm a fairly experienced R user, but until now I haven't had a good reason to learn to use databases. Now I have a problem where I am dealing with model output that I need to save to disk, and then query for another process. If the data were smaller, I'd store everything in a list, with hierarchical elements. For example, if my object is called output.OLS: 1> summary(output.OLS) Length Class Mode SEP0307 3 -none- list SEP0308 3 -none- list …
Category: Data Science

Data representation (NoSQL database?) for a medical study

Problem description I have a data set about 10000 patients in a study. For each patient, I have a list of various measurements. Some information is scalar data (e.g. age), some information is time series of measurements, some other information can be even a bitmap. The individual record itself can be quite thick (10kB to 10MB). The data is to be processed practically in two steps: Preprocessing at the level of individual records (patients), i.e. to extract some features in …
Category: Data Science

Any Master Thesis Topics related to NoSQL and Machine Learning or Business Intelligence?

Im currently in the last year, and I want to do a masters thesis on a topic that has NOSQL and Machine Learning or Business Intelligence. In my topic i want for defintely NOSQL, so I want to add a complementary topic (machine learning or business intelligence) to it. From my research i know that NOSQL: provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases. And …
Category: Data Science

NoSQL vs SQL backend for semi structured data

I have a corpus of job descriptions and another corpus of CVs of applicants. I plan to implement a matching system using machine learning algorithms, to find top 5 or top 10 applicants for each job description. Should I store the data in a document oriented NoSQL db (MongoDB) or stick to SQL. Given that the data I have is semi-structured at best, I feel a NoSQL db will offer more flexibility. I would appreciate opinions on this.
Category: Data Science

Python interface to Titan Database

How can I connect to Titan database from Python ? What I understand is that Titan (Graph database) provides an interface (Blueprint) to Cassandra (Column Store) and bulb is a python interface to graph DB. Now how can I start programming in python to connect with titan DB? Is there any good documentation/tutorial available ?
Category: Data Science

Data store for testing data products?

Is there a recommended approach for storing processed data for testing new data products? Basically, I'd like to have a system where a data scientist or an analyst could think of a new data product to present to users, do the data processing to create it, and then put it in a data store that our application can then access easily. What I'm not sure about is what kind of data store would be good for this type of "testing" …
Topic: sql nosql
Category: Data Science

Is this Neo4j comparison to RDBMS execution time correct?

Background: Following is from the book Graph Databases, which covers a performance test mentioned in the book Neo4j in Action: Relationships in a graph naturally form paths. Querying, or traversing, the graph involves following paths. Because of the fundamentally path-oriented nature of the datamodel, the majority of path-based graph database operations are highly aligned with the way in which the data is laid out, making them extremely efficient. In their book Neo4j in Action, Partner and Vukotic perform an experiment …
Category: Data Science

Can hadoop with Spark be configured with 1GB RAM

I'm trying to set up a cluster (1 namenode, 1 datanode) on AWS. I'm using free one year trial period of AWS, but the challenge is, instance is created with 1GB of RAM. As I'm a student, I cannot afford much. Can anyone please suggest me some solution? Also, it would be great if you could provide any links for setting up multi cluster hadoop with spark on AWS. Note: I cannot try in GCE as my trial period is …
Category: Data Science

is this a good case for NOSQL?

I'm currently facing a project that I could solve with a relational database in a relatively painful way. Having heard so much about NOSQL, I'm wondering if there is not a more appropriate way of tackling it: Suppose we are tracking a group of animals in a forest (n ~ 500) and would like to keep a record of a set of observations (this is a fictional scenario). We would like to store the following information in a database: a …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.