A/B testing with non-Gaussian distributions

Question

A/B testing with non-Gaussian distributions

anishtain4

2022年5月6日 06:01

I have two sets of samples (A, B) with a relatively high number (~10,000) and I want to see if a factor has affected sample B or not. Naturally, I should use A/B testing. The problem is, the distributions are not normal and I'm interested in the maximum change, not the mean values! So if all you know is how CLT is gonna make everything Gaussian, this is a good point to stop and move on to the next question. The data are distances, so there's a minimum of 0, but there's no max and no guarantee what the distribution is going to look like. As an example, the histograms look like this:

My gut feeling tells me that the maximum of orange sample is just randomly higher than the blue one, but gut feelings are usually wrong. So I want to have some method of testing. I would appreciate any input.

PS: Welch's t-test tells me that with 100.000% confidence, these two distributions are different, but are they?

Topic ab-test statistics

Category Data Science

Brian Spiering · Accepted Answer · 2020年3月20日 22:26

One option is a permutation test. A permutation does not make any assumptions about the distribution of the data and allows for testing maximum change.

For a permutation test, you randomly assign data points to labels and then calculate the maximum change under the null hypothesis. Repeat until you are confident the observed differences are or are not likely to happen by chance.

ripintheblue · Accepted Answer · 2019年10月22日 13:28

1

ripintheblue answered at 2019年10月22日 13:28

Yeah looks like it. perhaps your data is dependent on first random variables, which in turn effect overall distribution.

ripintheblue · Accepted Answer · 2019年10月22日 06:39

1

ripintheblue answered at 2019年10月22日 06:39

Welch's t-test assumes normal distribution. I'd assume your sample size is big enough to see that these two distributions are different, based on mean, variance and range differences

A/B testing with non-Gaussian distributions

About