Which coding language to use for very large datasets?

I have a very large panel data set (contains around 50M observations, size around 3G). I would to run an algorithm on it. The algorithm basically just loops over observations. Ideally, I would like to use functions in numpy, but I guess this would be really slow. Would R or matlab be good for this? Are there any other python packages I could use?

Thanks in advance for any help.

Topic programming matlab python r bigdata

Category Data Science


Python is highly efficient for large scale datasets. Second choince will be R.

Try python scikit-learn if you want to solve a machine learning problem. Pandas for manipulating and extracting data.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.