Practical constraints in A/B testing

I saw an article about an A/B test that google had performed way back. They wanted to decide what shade of blue a button should be and how that affects click-through rate. They divided users randomly into 100 buckets - each corresponding to a shade of blue they wanted to check (so the color is a factor with 100 levels).

Now this is all well and good if all the buckets (or treatment groups) sufficiently represent the target population. In other words, google had the liberty of taking large enough random samples from the population they wanted to target.

My question is: how would the test/sampling/bucketing/analysis have to be modified if there is a constraint on the number of replications?

For those experienced in experiment design / AB testing, I would be extremely grateful if you could include some other practical problems or constraints that arise in A/B testing - the kind that an average firm faces.

Topic ab-test experiments

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.