What is most likely the bare minimum knowledge one has to have to become data scientist?

I am a Python developer but I want to become a data scientist.

My Question:

At its core what is the bare minimum I need to have to make this transition? I know it cannot simply be that I need to learn Numpy and Pandas.

My Thoughts:

I am hoping to frame my question with the following three perspectives in mind and am trying to answer what is essentially needed for each category:

  1. Technical: Analytics
  2. Technical: Computer Science
  3. Non-Technical: Soft Skills

Any help would be appreciated. :)

Topic beginner python bigdata education

Category Data Science


Just a few thoughts which aren't covered in the link I pasted above ...

  1. Big data != data science. If you are a data scientist you may or may not be using big data tools. Your question wasn't clear if you understand this or not, but the distinction is important.
  2. There are various careers that 'fit' in the data science spectrum. Instead of repeating them here, I wrote a short blog a while back on building a data analytics team. People often broaden the term data scientist to include most of the elements in the team I describe.
  3. Learn some maths - if you don't already. The analytics part of data science is very heavy in maths.
  4. Expand on your data analytics algorithms, to include some clustering and classification as they are generally applicable, even if you think you are just going to be dealing with timeseries data.

From personal experience (so take into consider that I might not be representative although I'm probably not that far away too) the people that approached me with a job offer for

Data Scientist

did so because:

1) Considerable knowledge in one or more programming language typically used for data analysis. In my case Python.

2) Knowledge in applied mathematics (usually they don't even care about the base field). You just have to know how to interpret data and take valid conclusions from it (as a starting point at least).

3) Past experience with libraries such as numpy, scipy, scikit-learn (very relevant), scikit-image (if you are going to do image analysis also), pandas.

4) Past experience with data visualization libraries such as matplotlib, seaborn, Chaco, ggplot, pyQtGraph, Bokeh, etc.

5) Knowledge about regression techniques, clustering, and classification.

6) Valid extras depending on the field are typical applied mathematics in space estimation, image analysis and processing and computer vision, 3D visualization .

7) If you already have experience in building scientific software solutions using those programming languages, it might be a great advantage.

  • With point 7) in mind you might consider looking at PyQt5 and wxPython.

8) Ideally you are also able to present your results to an assistance that is not necessarily made of scientists only (I advise lots of illustrations..., actually, now that I think about it, lots of illustrations even if it's only scientists). So this takes some skill into building appropriate diagrams and figures (see vector graphics software such as Inkscape, together with plotting libraries it can make wonders).

9) Last but not least quite a bit of flexibility (this is common for scientific and development staff). Sometimes you need to change your technology and this takes some learning.

Notice that my experience does not say much in terms of web development per se. Mine is a scientific background with very little of web development so people that approach me, do so with this in mind. Other fields might request for different skills (and by the way, you don't need to be a web developer to deal with web data).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.