Sampling trying to keep as much multivariate variance as possible

Question

Sampling trying to keep as much multivariate variance as possible

PascalVKooten

2022年3月11日 07:06

I was thinking if anyone considered a sampling technique that would try to aim keeping as much of the variance as possible (e.g. as many unique values, or very widely distributed continuous variables).

The benefit might be that it will allow development of code around the sample, and really work with the edge cases in the data.

You can then later always take a representative sample.

So, I am wondering if people have tried to sample for maximum variance before and if there is a clever way to sample with as high possible variance (of course an approximation is just fine).

Topic multivariate-distribution variance sampling

Category Data Science

Brian Spiering · Accepted Answer · 2020年4月26日 15:33

It depends on what you mean by sampling. Is it sampling between or within features?

For between features, scikit-learn has a built-in option for VarianceThreshold which removes features whose variance does not meet some threshold.

Sampling trying to keep as much multivariate variance as possible

About