Panel Paper: Multi-Armed Trials for Informing Policy and Practice: A 50-Year Retrospective

Saturday, November 10, 2018
Wilson B - Mezz Level (Marriott Wardman Park)

*Names in bold indicate Presenter

Larry Orr, Johns Hopkins University

Some of the earliest social experiments were multi-armed trials. The income maintenance experiments of the 1960s and early 1970s all tested multiple variants of the negative income tax, assigning families to different combinations of the basic income guarantee and benefit reduction (“tax”) rate. The Housing Allowance Demand Experiment tested two different types of rent subsidy. And the Health Insurance Experiment tested multiple combinations of coinsurance and limits on out-of-pocket costs. Over the ensuing decades, a number of tests of public policy have adopted multi-armed designs.

In this paper, we explore the reasons why multi-armed trials have been conducted and the tradeoffs involved in using these designs vs. other designs. We argue that there are four fundamental reasons for such designs:

  1. Multi-armed designs allow the estimation of “response surfaces” – i.e., the variation in response across a range of a continuous policy parameter or multi-parameter policy space.
  2. Multi-armed designs allow the estimation of the separate and combined effects of discrete program components.
  3. Multi-armed trials allow the comparison of alternative policy approaches to the same social problem.
  4. Multi-armed trials are an efficient way to test multiple policy approaches simultaneously.

We illustrate each of these objectives with examples from the history of social experimentation over the past 50 years.

Each of these objectives implies a different experimental design. Choice among these designs is largely driven by the policy question to be addressed: design of an intervention in terms of policy parameters or components; choice among several well-defined interventions; or interest in the net impacts of several different interventions. But they also involve a number of tradeoffs among design parameters (e.g., sample size and allocation) that may affect both the detailed design of the trial and even its objectives. Among these tradeoffs are:

  1. The number of arms in the trial vs. statistical power for the estimation of net and/or differential impacts of any one intervention.
  2. The choice between being able to compare the impacts of different interventions (Objective 3 above) vs. simply estimating the net impact of each (Objective 4).
  3. The choice between simply comparing the impacts of two or more interventions vs. also being able to estimate their net impacts relative to the status quo.
  4. The choice between testing the effects of interventions on the population that would actually receive them in an ongoing program where program operators can choose among multiple interventions vs. estimating the impacts of those interventions on comparable populations.
  5. The choice between estimating impacts for the sample vs. for the larger population from which the sample was drawn.

We will explore each of these choices, as they relate to cost, statistical power, and the information obtained, with reference to actual trials.

Full Paper: