Treatment and Control selection in A/B Testing

I'm hoping to get a better understanding of A/B Testing design. In particular, I'm interested in understanding how treatment and control units are selected. I read that these 2 groups are selected randomly (for example, here), but then there are also approaches where after picking the treatment (either randomly or not) the control is selected based on "similarity" to the treatment group. Are both approaches valid and what's the rationale for picking one or the other?

For example, Alteryx has specific Treatment and Control Tools for this purspose, and they are not random (they use nearest neighbor methods).

Topic causalimpact ab-test statistics

Category Data Science


If I understand your question clearly(?), you are describing two different things, 1)AB testing and 2)case-controlled studies.

  1. Think back to Stats class, you did not learn about AB testing BUT what you did learn WAS Hypothesis testing or Null hypothesis testing, Testing the null hypothesis, Hypothesis-alternative testing (whatever) maybe even sometimes called a one-tailed or two-tailed test. Give or take, these are all fairly synonymous concepts. However, like almost all ideas when people leave academia the names change.

    • One typical experiment is to take 300 stores and sell either A or B. After a day, week, month, you measure the amounts sold then ask, 'Did the two items sell statistically different volumes?'

The second idea I read in your question considers a different way to set up your experiments.

  1. Another type of experiment is called a 'case-controlled study.'

    • So I want to test people this time. I can't in good conscience NOT give a person with a bad heart his medicine. I can't separate two groups of old folk and see who dies or not. So the tests are retrospective. I go to the hospital and check for all the patients that A)did or B)did not take the medication over the past x years.

    • For example, I find a 64 yr old male with heart problems who takes blood thinners. Then I look for a male approx. 64 yrs old with heart problems that did not take blood thinners then look at liver functions for both and test difference, let's say. I look for similarity among my patients in my study. Alternatively, If I choose a large enough sample size A)took blood thinners or B)no thinners. I can assume that most factors will cancel each other out. I go case by case.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.