(I actually wanted to write this as an answer to the Cross Validated question: Difference between Anomaly and Outlier, but the question is protected - I think answering it here should be fine, despite the lower visibility)
People occasionally argue that there is no difference between an outlier and an anomaly by citing Charu Aggarwal, author of the Book "Outlier Analysis" - particularly, this statement:
Outliers are also referred to as abnormalities, discordants, deviants, or anomalies in the data mining and statistics literature.
(Source: "Outlier Analysis" (Springer), Charu Aggarwal, 2017, http://charuaggarwal.net/outlierbook.pdf )
However, this statement does not imply that outliers and anomalies are the same thing - analogously to saying that "Dogs are sometimes referred to as animals" does not mean that they are the same thing.
It's hard to give a formal definition of the terms. The Wikipedia page about outliers refers to the Wikipedia page about anomaly detection and vice versa, and they both contain lots of possible definitions and interpretations of the terms. Things are becoming worse due to the domain-specific definitions and colloquialities, where it seems to be sufficient when two people of the same field roughly know what the other one is talking about...
However, Varun Chandola tries to give a more precise meaning to the term "anomaly" in his anomaly detection survey. Particularly, he classifies anomalies into three categories:
- Point anomalies: An individual data instance can be considered as
anomalous with respect to the rest of data
- Contextual Anomalies: If a data instance is anomalous in a specific context (but not otherwise)
- Collective Anomalies: If a collection of related data instances is anomalous with respect to the entire data set
(Summarized from "Anomaly Detection - A Survey", Varun Chandola et al, ACM Computing Surveys 2009, http://cucis.ece.northwestern.edu/projects/DMS/publications/AnomalyDetection.pdf )
Here, the term "point anomaly" seems to be closest to what I'd consider as a possible definition of the word "outlier". And this is in line with the statement by Aggarwal: An outlier is an anomaly. But not every anomaly is an outlier.
(The latter may depend on the definition of the word outlier. Of course, one can define it on a meta-level, and say that an outlier is whatever a certain outlier detection algorithm (or model) detects as such. But most definitions that I encountered so far are based on some sort of "distance", "dissimilarity", or "difference" from a "majority" of other data elements. That sounds reasonable...)
An example: There may be several data points:
14.5, 14.2, 14.4, 14.4, 14.4, 14.4, 14.4, 14.4, 14.4, 14.3, 14.2, 14.6
One can compute the mean and standard deviation and will have a hard time arguing why one of these points should be an "outlier".
For a sequence of data points like this
14.5, 14.2, 14.4, 14.4, -64564.4, 14.4, 14.4, 14.4, 14.4, 14.3, 14.2, 14.6
spotting "the outlier" should be easy.
However, assuming that the first sequence describes, for example, average daily outside temperatures, the fact that the exact same average temperature of 14.4
degrees was measured for a whole week could certainly be considered as an "anomaly".
(Probably a "collective anomaly" according to the definitions above, but I won't argue about that...)
Although I'm on thin ice when arguing about the precise or intuitive meaning of certain terms (because I'm neither a data science expert nor a native English speaker), this would mean that "anomaly" is a much broader term than "outlier". But maybe the data science community is just in the process of sorting out proper definitions of these terms.
Update:
Maybe my gut feeling about the literal meaning of certain words is wrong. But for me, the word "outlier" seems to say "lying somewhere out of (or far away from) something (based on some distance measure)". In that sense, the 14.4
s in the first example are not "outliers" per se. But of course, things become tricky very quickly here: One could imagine a model for the data that contains the number of consecutive days with equal temperatures (as in a run length encoding). Computing this model for the given data would yield
1 * 14.5
1 * 14.2
7 * 14.4
1 * 14.3
1 * 14.2
1 * 14.6
where the value 7
does have large distance (difference) to the other values in the model. So the "collective anomaly" of 7 consecutive days with equal temperatures has been turned into a "point anomaly" by this transformation.