Out of Memory Error when Selecting Data from Redshift Table

I am selecting data from Amazon Redshift Table with 500 millions rows. I have 64bit python installed.

code

import psycopg2
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('postgresql://'username':pwd@host/dbname')
data_frame = pd.read_sql_query('SELECT * FROM table_name ;', 
engine)

Everytime I run the code I get a "Out of Memory error". I have 16gb Ram. I am not sure how to resolve this issue.

Would really appreciate any help on this! Thanks

Topic redshift python

Category Data Science


First, you are trying to access a big dataset with sqlalchemy, while specialized packages like bigquery would be a more suitable choice. I suggest learning about it on https://www.kaggle.com/learn/intro-to-sql

Also, I think that you get more data than your device can handle. Maybe setting up a limit on your data will help.

data_frame = pd.read_sql_query('SELECT * FROM table_name LIMIT 1000000;', engine)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.