Is Data Science the Same as Data Mining?

I am sure data science as will be discussed in this forum has several synonyms or at least related fields where large data is analyzed.

My particular question is in regards to Data Mining. I took a graduate class in Data Mining a few years back. What are the differences between Data Science and Data Mining and in particular what more would I need to look at to become proficient in Data Mining?

Topic definitions data-mining

Category Data Science


@statsRus starts to lay the groundwork for your answer in another question What characterises the difference between data science and statistics?:

  • Data collection: web scraping and online surveys
  • Data manipulation: recoding messy data and extracting meaning from linguistic and social network data
  • Data scale: working with extremely large data sets
  • Data mining: finding patterns in large, complex data sets, with an emphasis on algorithmic techniques
  • Data communication: helping turn "machine-readable" data into "human-readable" information via visualization

Definition

can be seen as one item (or set of skills and applications) in the toolkit of the data scientist. I like how he separates the definition of mining from collection in a sort of trade-specific jargon.

However, I think that data-mining would be synonymous with data-collection in a US-English colloquial definition.

As to where to go to become proficient? I think that question is too broad as it is currently stated and would receive answers that are primarily opinion based. Perhaps if you could refine your question, it might be easier to see what you are asking.


My answer would be no. I consider Data mining to be one of the miscellaneous fields in Data science. Data Mining is mostly considered on yielding questions rather than answering them. It is often termed as "detecting something new", when compared to Data science, where the data scientist try to solve complex problems to be able to reach their end results. However both terms have many commonalities between them. For example..if u have an agricultural land where u aim to find the affected plants..Here spatial data mining plays a key role in doing this job.There are good chances that you may end up with not only finding out the affected plants in the land but also the extent to which they are affected.......this is something that is not possible with data science.


What @Clayton posted seems about right to me, for those terms, and for "data mining" being one tool of the data scientist. However, I haven't really used the term "data collection," and it doesn't strike me as synonymous with "data mining."

My own answer to your question: no, the terms aren't the same. Definitions may be loose in this field, but I haven't seen those terms used interchangeably. In my work, we sometimes use them to differentiate between goals, or methodologies. For us, is more about testing a hypothesis, and typically the data have been collected just for that purpose. is more about sifting through existing data, looking for structure, and perhaps generating hypotheses. Data mining can start with a hypothesis, but it's often very weak or general, and can be difficult to resolve with confidence. (Dig long enough and you'll find something, though it may turn out to be pyrite.)

However, we also have used "data science" as a wider term, to include "data mining." We also talk about "data modeling," which for us is about finding a model for a system of interest, based on data as well as other knowledge and objectives. Sometimes that means trying to find the math that explains the real system, and sometimes it means finding a predictive model that is good enough for a purpose.


There are many overlaps between data mining and datascience. I would say that people with the role of datamining are concerned with data collection and the extraction of features from unfiltered, unorganised and mostly raw/wild datasets. Some very important data may be difficult to extract, not do to the implementation issues but because it may have foreign artifacts.

Eg. if I needed someone to look at financial data from written tax returns in the 70s which were scanned and machine read to find out if people saved more on car insurance; a dataminer would be the person to get.

If I needed someone to examine the influence Nike's Twitter profile in the tweets of Brazil and identify key positive features from the profile, I would look for a datascientist.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.