To datawarehouse or not to data warehouse?

I was wondering if you will be as so kind to assist me with a quick question (will to be happy to explain more if you are willing to...). I am researching and setting up a system to do a machine learning job (training) to find correlations between Social Media (or other digital trails from wearables etc.) information of a user and his scores on personality tests. The scores are in my Postgresql (on AWS) and I need to decide …
Topic: redshift
Category: Data Science

Out of Memory Error when Selecting Data from Redshift Table

I am selecting data from Amazon Redshift Table with 500 millions rows. I have 64bit python installed. code import psycopg2 from sqlalchemy import create_engine import pandas as pd engine = create_engine('postgresql://'username':pwd@host/dbname') data_frame = pd.read_sql_query('SELECT * FROM table_name ;', engine) Everytime I run the code I get a "Out of Memory error". I have 16gb Ram. I am not sure how to resolve this issue. Would really appreciate any help on this! Thanks
Category: Data Science

Finding change maximum change in the value using Redshift

Following is the problem I want to solve. But I don't know how to implement it. I am using Redshift to store data. Following is the format of the data stored in Redshift. It is sales history for every product for all year by month. ProductId Year Month Sales A 2018 1 ... A 2018 2 ... A 2018 3 ... A 2018 4 ... A 2018 5 ... B 2018 1 ... B 2018 2 ... B 2018 3 …
Category: Data Science

Big Data - Data Warehouse Solutions?

I have a dozen of databases that stores different data, and each of them are 100TBs in size. All of the data is stored in AWS services such as RDS, Aurora and Dynamo. Many times I find myself need to perform "joins" across databases, for example a student ID that appears in multiple databases with data that I want to gather. The joins are usually done after data is streamed out of the database, since the data is not located …
Category: Data Science

Using regex in redshift to find dollar values

I have a field in a Redshift table that has user-generated text. The field is where users can say how much they think something costs. Ideally it'd just be a decimal, but it's varchar. So users can type "I think this is worth \$25", or "I'd pay 55" or "\$117". So I'm trying to use regexp_substr to pull this out. Specifically regexp_substr(f.comment_text, '\\$?[0-9]*'). But this doesn't work on a subset of entries for some reasons (eg Could do for $115). …
Topic: redshift regex
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.