AB testing split algorithm

Question

AB testing split algorithm

AO1992

2021年10月20日 12:29

I want to understand what is the most effective algorithm for splitting. I have ids of users and I want to split them into 2 groups.

Now I have 2 variants:

Modulo approach - let's say we will place all even ids into one group, odd numbers into another.

Pros - for any sequence we will have a uniform distribution of users. So for any day or hour, users that registered during that time will be equally divided between 2 groups.

Con - if I need two perform a few tests on the same group of users I might have problems

Splitting with md5 - see this.

Pros - we can easily perform a lot of test on the same data, just need to use different salt

Con - for some sequence of orders, we will not have a uniform split. So we might have some problems with, for example, weekly seasonality.

So is there something in the middle? Can I find an appropriate hash that won't be so 'random' as md5 and also will allow me to conclude multiple tests on the same group of users. We are talking about let's say 5-10 tests

Topic hashing-trick ab-test

Category Data Science

B.Liu · Accepted Answer · 2021年10月20日 12:29

I assumed when you say "I need two[sic] perform a few tests on the same group of users I might have problems", you intend to re-randomise the said group of users into a new control and treatment group when you start a new test.

A possible middle-ground is to randomise twice, i.e. first randomise users into many buckets, and then randomise buckets into treatment/control groups when you need to run a new test.

There are a couple flavours to this as well:

1. "Deterministically" assign user into buckets (based on some rules on an sufficiently random identifier that is sufficiently independent from user feature(s) and the treatment), randomly pick buckets to test:

For example, the experimenter can create 10 buckets based on the last digit of the user ID. This assumes you have sufficient users such that the last digit is uncorrelated to some hourly/daily seasonality. For each test, the experimenter then randomly select 5 buckets into the control group and 5 buckets into the treatment group, e.g.:

Test 1 - Control: 5, 9, 6, 0, 3; Treatment: 4, 8, 1, 7, 2
Test 2 - Control: 1, 7, 2, 6, 9; Treatment: 3, 4, 8, 5, 0
Test 3 - Control: 7, 4, 2, 6, 3; Treatment: 0, 9, 1, 5, 8

...and so on [1].

2. Randomly assign user into buckets, deterministically / randomly pick buckets to test:

This is the approach used by many big techs' experimentation platforms. See, e.g. Figure 1 of [2] or [3]. In [2] they have both users and user clusters as experimentation unit, but the bucketing principle is the same.

Users are randomly assigned into, say, 1,000 buckets based on some hash and have their bucket assignment recorded. The experimenter will then decide how many buckets (e.g. 100 for each of control/treatment) they need for their experiments, and either pick, or get randomly assigned by the experimentation platform, the buckets they need.

This approach has an advantage whereas an experimenter can enforce some sort of bucket exclusion, whereas the experimenter would prevent users participating in a particular experiment to also join the one soon to start as the treatments may interact with each other.

Theoretically, both approaches will lead to a randomised assignment between the control and treatment group. Of course, this assumes you have enough users and your bucketing implementation is correct, which is something that needs to be checked carefully.

[1] I used the Random Integer Set Generator with the options:
Generate 5 set(s) with 10 unique random integer(s) in each.
Each integer should have a value between 0 and 9...
✔️ Use commas to separate the set members
⚫ Print the sets in the order they were generated".

[2] B. Karrer et al., Network experimentation at scale, In: KDD'21. Available: https://arxiv.org/pdf/2012.08591.pdf

[3] J. Rydberg, Spotify’s New Experimentation Platform (Part 2). Available: https://engineering.atspotify.com/2020/11/02/spotifys-new-experimentation-platform-part-2/

AB testing split algorithm

About