Are cluster feature and micro-cluster good summary statistics for outlier detection in high dimensional data streams?

Question

Are cluster feature and micro-cluster good summary statistics for outlier detection in high dimensional data streams?

I Sui

2022年1月19日 05:16

I'm dealing with outlier detection in data streams. I'm looking for a way to summarize my data and obtain important statistics such as means and variance, etc. I want to know if the cluster features or microclusters are suitable or not.

Topic anomaly anomaly-detection outlier data-stream-mining clustering

Category Data Science

Ashwiniku918 · Accepted Answer · 2022年1月19日 05:16

Traditional clustering algorithm which uses Euclidean based distance fails to yield good results in high dimensional data due to Curse of dimensionality

Because mean distance between data points diverges and looses its meaning which in turn leads to the divergence of the Euclidean distance, the most common distance used for clustering.

So if you are using any Euclidean based clustering algorithm i would highly suggest not to do that.

But if your clustering algorithm is not impacted by High demensionality problem like Hierarchical DB Scan you can do what you are suggesting

Has QUIT--Anony-Mousse · Accepted Answer · 2019年12月26日 08:40

1

Has QUIT--Anony-Mousse answered at 2019年12月26日 08:40

No.

Because assignment to microclusters is distance-based, and distances do not work in high-dimensional data anymore. Most likely one mucrocluster will become most central by chance and collect all the samples.

Are cluster feature and micro-cluster good summary statistics for outlier detection in high dimensional data streams?

About