I assumed when you say "I need two[sic] perform a few tests on the same group of users I might have problems", you intend to re-randomise the said group of users into a new control and treatment group when you start a new test.
A possible middle-ground is to randomise twice, i.e. first randomise users into many buckets, and then randomise buckets into treatment/control groups when you need to run a new test.
There are a couple flavours to this as well:
1. "Deterministically" assign user into buckets (based on some rules on an sufficiently random identifier that is sufficiently independent from user feature(s) and the treatment), randomly pick buckets to test:
For example, the experimenter can create 10 buckets based on the last digit of the user ID. This assumes you have sufficient users such that the last digit is uncorrelated to some hourly/daily seasonality. For each test, the experimenter then randomly select 5 buckets into the control group and 5 buckets into the treatment group, e.g.:
- Test 1 - Control: 5, 9, 6, 0, 3; Treatment: 4, 8, 1, 7, 2
- Test 2 - Control: 1, 7, 2, 6, 9; Treatment: 3, 4, 8, 5, 0
- Test 3 - Control: 7, 4, 2, 6, 3; Treatment: 0, 9, 1, 5, 8
...and so on [1].
2. Randomly assign user into buckets, deterministically / randomly pick buckets to test:
This is the approach used by many big techs' experimentation platforms. See, e.g. Figure 1 of [2] or [3]. In [2] they have both users and user clusters as experimentation unit, but the bucketing principle is the same.
Users are randomly assigned into, say, 1,000 buckets based on some hash and have their bucket assignment recorded. The experimenter will then decide how many buckets (e.g. 100 for each of control/treatment) they need for their experiments, and either pick, or get randomly assigned by the experimentation platform, the buckets they need.
This approach has an advantage whereas an experimenter can enforce some sort of bucket exclusion, whereas the experimenter would prevent users participating in a particular experiment to also join the one soon to start as the treatments may interact with each other.
Theoretically, both approaches will lead to a randomised assignment between the control and treatment group. Of course, this assumes you have enough users and your bucketing implementation is correct, which is something that needs to be checked carefully.
[1] I used the Random Integer Set Generator with the options:
Generate 5 set(s) with 10 unique random integer(s) in each.
Each integer should have a value between 0 and 9...
✔️ Use commas to separate the set members
⚫ Print the sets in the order they were generated".
[2] B. Karrer et al., Network experimentation at scale, In: KDD'21. Available: https://arxiv.org/pdf/2012.08591.pdf
[3] J. Rydberg, Spotify’s New Experimentation Platform (Part 2). Available: https://engineering.atspotify.com/2020/11/02/spotifys-new-experimentation-platform-part-2/