[Note: essentially my answer is the same as @ncasas, just an alternative phrasing]
Classification belongs to supervised learning whereas clustering belongs to unsupervised learning:
- In supervised learning there is a training stage during which some instances (examples) are provided together with their answer (the target). During training the model "studies" all the examples in the training data (represented with features) in order to be able to find the target from the features. After it has been trained, the model can be applied to new instances and use their features to predict their target. In short the main characteristics of supervised learning are:
- The goal is to predict a specific piece of information defined from the start (the target).
- It requires some training data: features and answers for a large set of instances.
- In unsupervised learning the goal is to discover the patterns within the data. There is no predefined target and no training stage (thus no need for annotated data). Unsupervised learning can only do general tasks based on comparing instances, such as clustering (grouping similar instances together) or ranking (ordering instances relatively to each other).
This is the fundamental difference between classification and clustering. Based on this understanding:
What's the difference between data classification and clustering (from a Data point of view)
From a strict data point of view, the difference is the requirement for annotated data in classication. There is no such requirement for clustering.
Is data classification a sub topic of data clustering ?
No because they belong to different families of ML which have different goals.
Example:
- In spam classification (supervised task) a model is trained with some documents (usually emails) labelled as spam or not spam. The resulting model can predict whether a new document is spam or not.
- In topic modelling (unsupervised task) a model groups semantically similar documents together, based on the words they contain.
The first task separates documents into classes, but these classes are predefined: here spam vs. non-spam. The model uses features specifically as indicators for this goal. It would use features in a completely different way if the classes were news vs. entertainment, business vs. personal, or sci-fi vs. romance. Hence the term supervised learning: the model focuses on what it is told (trained) to focus on.
Topic modelling separates documents into several clusters, but even if we assume exactly two clusters these are extremely unlikely to correspond to spam vs. non-spam (or news vs. entertainment, etc.). A clustering algorithm follows a neutral similarity method which uses the features indiscriminately. The main outcome are the clusters themselves, which represent unknown patterns in the data. For example applying topic modelling in a large collection of documents may lead to discover what are the main categories of documents: the new knowledge is the existence of these groups. Clustering is unsupervised because it doesn't follow a predetermined goal.