I am a programmer, how do I get into field of Data Science?

First of all this term sounds so obscure.

Anyways..I am a software programmer. One of the languages I can code is Python. Speaking of Data I can use SQL and can do Data Scraping. What I figured out so far after reading soo many articles that Data Science is all about good at:

1- Stats

2- Algebra

3- Data Analysis

4- Visualisation.

5- Machine Learning.

What I know so far:

1- Python Programming 2- Data scrapping in Python

Can you experts guide me or suggest a roadmap to brush up both theory and practical? I have given around 8 months of time frame to myself.

Topic career beginner

Category Data Science


Basic courses like Andrew ng Machine Learning on Coursera or the Introduction to Statistical Learning (free) book would be my recommandations for a first step in Data Science / Machine Learning. They both cover basic statistical concepts and main modelling traps and are enough to start your first projects.

Then I would suggest you to find a domain of application, learn about it and put your knowledge into work. Depending on what you want to achieve you will have plenty of occasions to dive into specific fields if needed (a given library, advanced statistics, domain oriented tools like for NLP or computer vision...).


If you are a programmer, you could start with a Decision Tree classifier, focus on understanding the math behind Entropy and Information-Gain. It is essential to understand that ML is just all about data compression.

I'd highly disagree with some of the other answers on the value of practical courses. Most valuable for ML is math: number theory, linear algebra and probability theory.

If you don't focus on math, the only thing that you will learn is, how to use some library for doing magic, that's not machine learning and not science at all.


Data Science is so broad, there's many different paths to get into it. It is usually split into 4 or 5 different types for example:

enter image description here

You could see from the other posts in this topic people coming from an Applied Statistics background (applying the right algorithm), Programming background (participating in Kaggle), and others applying it to a business background

Savvy companies could refer to a programming skewed person as a "Data Engineer". Big companies also use each type for their data science team, so demonstrating good T-shaped skills would be a good thing.


David has a good point , I would suggest you focus on whatever it is that drives your interest more. It's the only way to succeed in every kind of effort. If you want to build something cool start with it. If you want to read a book thats good too. The starting point doesn't matter. A few days ahead you will have a better understanding on what you want and should do next.


If you want to be a practical man with true knowledge, start with math(calculus, probability + stat, lelinear algebra). On every step try to implement everything with programing, python is nice for this. When u get good ground, play with real data and solve for problems

Courses. Linear algebra - edx Laff or coding the matrix Stat - edx stat 2x Barkley Calculus - read...its simple


Disagree with David, a true data scientist is an applied statistician who codes and knows how to use machine learning algorithms for the right reasons. Statistics is the base of all data science. It is the "cake" per se. Everything else is just icing.

The question is what kind of data scientist do you want to be? Do you want to be a master of the subject (knowledge of how, why, when and when not to apply an algorithm or technique) or a Kaggle Script Kiddie using Scipy and thinking that he is a Data Scientist?

1 - Stats

2- Everything else


I do like Berkeley course on Data Science, will give a good foundation and taste for Data Science, After moved to udacity and coursera and many more resources. So if you have Programming skills than will need math and stat and a lot of visualization. Also will be great to get used to IPython because is essential to see every step(visualize)how it perform instead writing a whole script and test after (anaconda is easy to install and work with). Course is listed bellow: bcourses.berkeley.edu/courses/1267848/wiki also the stat i find good free course from SAS: Statistics 1: Introduction to ANOVA, Regression, and Logistic Regression support.sas.com/edu/schedules.html?ctry=us&id=1979

Starting with ML will recommend: www.kaggle.com/c/titanic/details/getting-started-with-python

on left side is also for Excel using Pivot tables and R. DataCamp has released the tutorial on how to use R. Once you complete this steps than more competitions in gaining experience are on kaggle (recently released one for San Francisco Crime Classification) and ultimately amazing video tutorials from www.dataschool.io

hope it helps ...


Focus less on gaining skills and more on gaining experience. Try to actually solve some problems and post your work on github. You'll learn more in the process and be able to demonstrate knowledge and experience to employers, which is much more valuable than having a supposedly deep understanding of a topic or theory.

Data Science is a pretty loaded field these days so I'm not sure what kind of work you specifically want to do, but assuming that machine learning is a component of it then kaggle.com is a good place to start. In terms of goals, if you're able to work with the data in pandas/numpy/scipy, build models in sci-kit learn and make some pretty graphs in seaborn, ggplot or even matplotlib then you won't have a problem getting a job from a skills perspective -- especially if you have code samples and examples to demonstrate your abilities. If you get stuck then stackexchange will either have the answer or you can post a questions and you'll have an answer shortly. Once you're doing the work for a living then you'll learn even more, likely from a senior team member who mentors you.

Best of luck.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.