Do DBMS decrease Memory requirements?

I finished my Economics thesis using RStudio, but my script was very slow due to massive RAM consumption during the process.


My Case

I had a massive dataset (stock prices in daily frequency for 10 years, ~700 stocks i.e. $3500\times700$) and I was picking each stock as a vector to decompose it into wavelets and CF filter (2 datasets $28000\times700$) and apply benford's law (two datasets $9\times700$).


The Problem

RStudio was storing my datasets in memory and they were consuming a significant proportion just by touching them.


Question

I started learning basic SQL commands and I found out that I can call specific columns from a certain table. Would my script be more efficient if I was calling my stocks one by one as vectors from there instead of picking them directly from RStudio? In other words, do queries call the whole dataset and then retrieve the requested values or do they follow a kind of shortcut to be memory efficient? If not, what's the purpose of using databases for domestic use?

Topic rstudio r relational-dbms efficiency

Category Data Science


Welcome to the site!

Would my script be more efficient if I was calling my stocks one by one as vectors from there instead of picking them directly from RStudio?

Yes as you told, calling the specific columns from Database is better than extracting everything. I generally use Dataframes rather than vectors. Dataframes are very efficient and the transformation using Dataframes is much easier and better. You can go through this link, for better understanding on Vector Vs Dataframe.

Currently, I use 1 year data and consists of 100,000 records. Query takes like 8 minutes to extract the data in Rstudio using SQL DB and store it in an R-dataframe. After the extraction is over I don't hit the database meaning no read and write on the database by which the DB Server is not effected. After doing all the modeling and finally I commit the data into SQL DB to store the committed data(results).

In other words, do queries call the whole dataset and then retrieve the requested values or do they follow a kind of shortcut to be memory efficient?

It depends on how you write the query, in my experience when I tried using temp-tables and is much efficient but when tried running that query in R but it wasn't supporting so had to stick to nested queries. If you don't write the nested queries in an optimized manner then it takes more time to extract the data(memory lekage).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.