VM image for data science projects

As there are numerous tools available for data science tasks, and it's cumbersome to install everything and build up a perfect system.

Is there a Linux/Mac OS image with Python, R and other open-source data science tools installed and available for people to use right away? An Ubuntu or a light weight OS with latest version of Python, R (including IDEs), and other open source data visualization tools installed will be ideal. I haven't come across one in my quick search on Google.

Please let me know if there are any or if someone of you have created one for yourself? I assume some universities might have their own VM images. Please share such links.

Topic python r tools

Category Data Science


Today I used this repository and built it with docker. It is a docker image building spark based on Hadoop image of the same owner. If you to use spark, it has a python api called pyspark.


Did you try Cloudera's QuickStart VM?:

I found it very easy to run it and it includes open source software such as Mahout and Spark.


If you are looking for a VM with a bunch of tools preinstalled, try the Data Science Toolbox.


While Docker images are now more trendy, I personally find Docker technology not user-friendly, even for advanced users. If you are OK with using non-local VM images and can use Amazon Web Services (AWS) EC2, consider R-focused images for data science projects, pre-built by Louis Aslett. The images contain very recent, if not the latest, versions of Ubuntu LTS, R and RStudio Server. You can access them here.

Besides main components I've listed above, the images contain many useful data science tools built-in as well. For example, the images support LaTeX, ODBC, OpenGL, Git, optimized numeric libraries and more.


There is another choice which popular recently: docker(https://www.docker.com). Docker is a container and let you create/maintain a working environment very easily and fast.

Hope that would help you.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.