so I'm working on a project and I'm sort of stuck as to how to store my data. I have a concept I want to propose but am unsure whether it is possible, if it is not I would appreciate any help pointing me in the right direction. So as I said im working on a small project, for this project I want to store 2, 2 dimensional arrays every 20 seconds or so and have the time (seconds from …
I want learn about NoSQL and when is better to use SQL or NoSQL. I know that this question depends on the case, but I'm asking for a good documentation on NoSQL, and some explanation of when is better to use SQL or NoSQL (use cases, etc). Also, your opinions on NoSQL databases, and any recommendations for learning about this topic are welcome.
I want to learn more about the recommender system topic. I am very interested in the usage of different database systems for this use case. My problem is that I cannot find a good overview of different architectures of recommender systems, especially with the focus on the database part. Can someone help me out with a good reference or some own thoughts? Thanks a lot. As interesting as this topic is for me as hard it seems to get some …
What are the most effective bread-and-butter in-memory open source tabular data frameworks today? I have been working with tabular data for years with an in-house solution that integrates with Excel well, but falls short of many other expectations. I would like to (if possible/true) demonstrate that our solution has fallen behind the times. In other words, assuming an SQL-like platform is responsible for persistence of a data set, but cycle intensive calculations need to be performed on that dataset (E.g. …
I'm writing to get advices about my project. I want to make recommander system for shop with some products. In fact i want to recommand to shop A to take item X because shop B sell this item and shops A and B are very similar. The "problem" here is the size of the data : i have around 5TB of raw data (about 8 000 000 000 lines) So it's very difficult to do something with huge data like …
We created a social network application for eLearning purposes. It's an experimental project that we are researching on in our lab. It has been used in some case studies for a while and the data in our relational DBMS (SQL Server 2008) is getting big. It's a few gigabytes now and the tables are highly connected to each other. The performance is still fine, but when should we consider other options? Is it the matter of performance?
I've got about 5 million JSON files, about 50GB in total. They do not have a consistent schema (they're broadly the same format, but some have extra extension fields, some have missing fields, etc - the schema is quite complexly nested). I would like to run SQL-like queries across these files - e.g. finding the count of files with a certain property, finding the count of files where property is in a numeric or time range, etc. I have the …
I'm a fairly experienced R user, but until now I haven't had a good reason to learn to use databases. Now I have a problem where I am dealing with model output that I need to save to disk, and then query for another process. If the data were smaller, I'd store everything in a list, with hierarchical elements. For example, if my object is called output.OLS: 1> summary(output.OLS) Length Class Mode SEP0307 3 -none- list SEP0308 3 -none- list …
When a relational database, like MySQL, has better performance than a no relational, like MongoDB? I saw a question on Quora other day, about why Quora still uses MySQL as their backend, and that their performance is still good.
Problem description I have a data set about 10000 patients in a study. For each patient, I have a list of various measurements. Some information is scalar data (e.g. age), some information is time series of measurements, some other information can be even a bitmap. The individual record itself can be quite thick (10kB to 10MB). The data is to be processed practically in two steps: Preprocessing at the level of individual records (patients), i.e. to extract some features in …
What is the best noSQL backend to use for a mobile game? Users can make a lot of servers requests, it needs also to retrieve users' historical records (like app purchasing) and analytics of usage behavior.
Im currently in the last year, and I want to do a masters thesis on a topic that has NOSQL and Machine Learning or Business Intelligence. In my topic i want for defintely NOSQL, so I want to add a complementary topic (machine learning or business intelligence) to it. From my research i know that NOSQL: provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases. And …
I have a corpus of job descriptions and another corpus of CVs of applicants. I plan to implement a matching system using machine learning algorithms, to find top 5 or top 10 applicants for each job description. Should I store the data in a document oriented NoSQL db (MongoDB) or stick to SQL. Given that the data I have is semi-structured at best, I feel a NoSQL db will offer more flexibility. I would appreciate opinions on this.
How can I connect to Titan database from Python ? What I understand is that Titan (Graph database) provides an interface (Blueprint) to Cassandra (Column Store) and bulb is a python interface to graph DB. Now how can I start programming in python to connect with titan DB? Is there any good documentation/tutorial available ?
I heard about many tools / frameworks for helping people to process their data (big data environment). One is called Hadoop and the other is the noSQL concept. What is the difference in point of processing? Are they complementary?
Is there a recommended approach for storing processed data for testing new data products? Basically, I'd like to have a system where a data scientist or an analyst could think of a new data product to present to users, do the data processing to create it, and then put it in a data store that our application can then access easily. What I'm not sure about is what kind of data store would be good for this type of "testing" …
Background: Following is from the book Graph Databases, which covers a performance test mentioned in the book Neo4j in Action: Relationships in a graph naturally form paths. Querying, or traversing, the graph involves following paths. Because of the fundamentally path-oriented nature of the datamodel, the majority of path-based graph database operations are highly aligned with the way in which the data is laid out, making them extremely efficient. In their book Neo4j in Action, Partner and Vukotic perform an experiment …
I'm trying to set up a cluster (1 namenode, 1 datanode) on AWS. I'm using free one year trial period of AWS, but the challenge is, instance is created with 1GB of RAM. As I'm a student, I cannot afford much. Can anyone please suggest me some solution? Also, it would be great if you could provide any links for setting up multi cluster hadoop with spark on AWS. Note: I cannot try in GCE as my trial period is …
I'm currently facing a project that I could solve with a relational database in a relatively painful way. Having heard so much about NOSQL, I'm wondering if there is not a more appropriate way of tackling it: Suppose we are tracking a group of animals in a forest (n ~ 500) and would like to keep a record of a set of observations (this is a fictional scenario). We would like to store the following information in a database: a …