Data Science Methodologies

What are the best known Data Science Methodologies today? By methodology I mean a step-by-step phased process that can be used for framing guidance, although I will be grateful for something close too.

To help clarify, there are methodologies in the programming world, like Extreme Programming, Feature Driven Development, Unified Process, and many more. I am looking for their equivalents, if they exist.

A google search did not turn up much, but I find it hard to believe there is nothing out there. Any ideas?

Topic methods

Category Data Science


Okay, I eventually found what I was looking for in the Data Mining Community. There seem to be two candidates, CRISP-DM which comes from SPSS originally but is "Cross-Industry", and SEMMA which comes from SAS. They are both pretty much what I was looking for.

CRISP-DM http://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining

And

SEMMA http://en.wikipedia.org/wiki/SEMMA


I'm currently writing a book about Data Science in Higher Education, and the following methodologies are the ones I am including:

For regression, we have:

  • Simple Linear Regression
  • Multiple Linear Regression

For classification, we have:

  • Naive Bayes Classifier
  • Decision Tree Induction
  • K-Nearest Neighbor

These are some of the more elementary topics in statistical analysis (which you could argue is predictive analytics which you could argue is data science), and thus I would suspect they are also the more common.


Can you elaborate what you mean by 'methodologies'?

In the meantime, take a look at The Field Guide To Data Science by Booz Allen Hamilton. This guide talks about data science processes and frameworks.

Data Science Design Patterns by Mosaic talks about, you guessed it, data science design patterns. This is quite useful to get a sense of common design patterns. They are also working on releasing a book on the same subject.

Then there are several resources out there that will come up as results to more targeted searches, such as machine learning paradigms, recommender systems paradigms, etc. Data Science is a large and varied field, and you'll find many resources out there for each subsection of it. As far as I know, there isn't one book that covers it all.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.