This is a usual situation I meet recently that customers gave me a database with many tables they don't quite understand too, then ask me to make a model predict the future revenue, classify which user may be valuable or something else. To be honest, extracting useful data from an unknown database made me exhausted. For example, I need to figure out which table is the user table, product table, or transaction table ... which column can use to join(there …
I've studied this exercise in class But I cannot figure out why, when I push down the Group By clause, I can remove RLCode attribute from GroupBy. Does this action change the meaning of query tree?
We created a social network application for eLearning purposes. It's an experimental project that we are researching on in our lab. It has been used in some case studies for a while and the data in our relational DBMS (SQL Server 2008) is getting big. It's a few gigabytes now and the tables are highly connected to each other. The performance is still fine, but when should we consider other options? Is it the matter of performance?
I finished my Economics thesis using RStudio, but my script was very slow due to massive RAM consumption during the process. My Case I had a massive dataset (stock prices in daily frequency for 10 years, ~700 stocks i.e. $3500\times700$) and I was picking each stock as a vector to decompose it into wavelets and CF filter (2 datasets $28000\times700$) and apply benford's law (two datasets $9\times700$). The Problem RStudio was storing my datasets in memory and they were consuming …
I have a huge dataset from a relational database which I need to create a classification model for. Normally for this situation I would use Inductive Logic Programming (ILP), but due to special circumstances I can't do that. The other way to tackle this would be just to try to aggregate the values when I have a foreign relation. However, I have thousands of important and distinct rows for some nominal attributes (e.g.: A patient with a relation to several …
I have a mysql database with the following format: id string 1 foo1... 2 foo2... .. ... There are >100k entries in this db. What I want to do is for each string, compare it to each other string and store some metric of the comparison. Doing this will essentially yield a 2D matrix of size NxN where N is the number of row in the db. My initial thought was creating another db where each index corresponds to the …