Getting a balanced sample across many variables
Let’s say each element in my population has several attributes. Let’s call then A, B, C, D, E, F.
Let’s say, for simplicity, each attribute has 10 values (but could be any number between 2 and 30). Now I want to get a sample such that the distribution is the same across all features. So for example if the whole population has about 15% of people in feature A with value 1, my sample should be the same.
What should be the way for me to select a size for the sample and choose a sample that has the desired properties?
Topic multivariate-distribution distribution sampling statistics
Category Data Science