Solusion to discover/inference the usage/meanings of tables in unkown database?

This is a usual situation I meet recently that customers gave me a database with many tables they don't quite understand too, then ask me to make a model predict the future revenue, classify which user may be valuable or something else. To be honest, extracting useful data from an unknown database made me exhausted. For example, I need to figure out which table is the user table, product table, or transaction table ... which column can use to join(there …
Category: Data Science

Pushing down Group By clause

I've studied this exercise in class But I cannot figure out why, when I push down the Group By clause, I can remove RLCode attribute from GroupBy. Does this action change the meaning of query tree?
Category: Data Science

The data in our relational DBMS is getting big, is it the time to move to NoSQL?

We created a social network application for eLearning purposes. It's an experimental project that we are researching on in our lab. It has been used in some case studies for a while and the data in our relational DBMS (SQL Server 2008) is getting big. It's a few gigabytes now and the tables are highly connected to each other. The performance is still fine, but when should we consider other options? Is it the matter of performance?
Category: Data Science

Do DBMS decrease Memory requirements?

I finished my Economics thesis using RStudio, but my script was very slow due to massive RAM consumption during the process. My Case I had a massive dataset (stock prices in daily frequency for 10 years, ~700 stocks i.e. $3500\times700$) and I was picking each stock as a vector to decompose it into wavelets and CF filter (2 datasets $28000\times700$) and apply benford's law (two datasets $9\times700$). The Problem RStudio was storing my datasets in memory and they were consuming …
Category: Data Science

Relational Data Mining without ILP

I have a huge dataset from a relational database which I need to create a classification model for. Normally for this situation I would use Inductive Logic Programming (ILP), but due to special circumstances I can't do that. The other way to tackle this would be just to try to aggregate the values when I have a foreign relation. However, I have thousands of important and distinct rows for some nominal attributes (e.g.: A patient with a relation to several …
Category: Data Science

Storing Big Matrix in DataBase

I have a mysql database with the following format: id string 1 foo1... 2 foo2... .. ... There are >100k entries in this db. What I want to do is for each string, compare it to each other string and store some metric of the comparison. Doing this will essentially yield a 2D matrix of size NxN where N is the number of row in the db. My initial thought was creating another db where each index corresponds to the …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.