How to make R or Python as fast as SAS for ODBC Oracle queries?

I want to use R or Python to query big structured SQL-type data, but they are very slow compared to SAS.

I tried using R and Python to return a 1.3 million record Oracle ODBC passthrough query. The query took 8-15 seconds in SAS, 20-30 seconds in Python, and 50-70 seconds in R. Does anyone know why?

R Packages Used:

First I used the RODBC package in R to query to the Oracle database. Then I tried the ROracle package, but both packages were much slower than SAS.

Python Packages Used:

For Python, I used Oracle's cx_Oracle package for the query.

Thanks a lot, Sean

Topic etl sas python r bigdata

Category Data Science


This answer some of your questions from a Python perspective:

Is Python any faster?

This question is a bit tricky to answer, it will depend on your usage of Python, but Python is not a fast language per se. However, the pandas library in Python have been reported to handle tables of 33M-100M rows, see this. I myself have used to handle around 10M rows from a Postgres table. For a detailed experimentation using pandas, see this. In the link they apply some operations on datasets of 88M rows and 74 columns.

Do we need Hadoop or parallel processing or something else to make R/Python as fast as SAS?

Before trying using Hadoop or Spark, I recommend you to follow some optimization tricks (tips):

[1] A Beginner’s Guide to Optimizing Pandas Code for Speed

[2] Using pandas with large data

This link (Don't use Hadoop - your data isn't that big) can be useful too.

Also for a comparison of SAS and pandas you can read the pandas documentation on such comparison or this.

Is it because R and Python are matrix-based languages and SAS is more traditional-database-oriented?

Python is not a matrix-based language, as far as I know the language does not offer by default any capabilities for handling matrices. I think your referring to the numpy/scipy stack, this is a separated library.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.