Why Use Design-Based Methods for RCTs?

Schochet, Peter Z.; Schochet, Peter Z.

Design-based methods for experimental designs were introduced by Neyman (1923) and later developed in seminal works by Rubin (1974, 1977) and Holland (1986) using a potential outcomes framework. More recently, this work has been further developed in Imbens and Rubin (2015) and Schochet (2015). Under the simplest design where individuals are randomized to a single treatment or control group, design-based theory is based on the following data generating process for the observed outcome for an individual (y_i):

(1) y_i = T_iY_i(1) + (1-T_i)Y_i(0).

In this expression, Y_i(1) is the potential outcome for individual i in the treatment condition, Y_i(0) is the potential outcome for the same individual in the control condition, and T_iis the treatment status indicator variable.

The presentation will demonstrate how the simple relation in Equation (1) can be used to develop consistent impact and variance estimators that are asymptotically normally distributed. The approach can accommodate designs where the only source of randomness is T_i (the finite population [FP] model), as well as designs where the potential outcomes are also assumed to be randomly sampled from broader populations (the super-population [SP] model). The approach can also be adapted to clustered designs (for example, where schools or hospitals are randomized) where Equation (1) is averaged to the cluster level. Finally, this approach can be extended to blocked designs where random assignment is conducted within blocks that can be assumed to be fixed for the sample or randomly sampled from a broader set of blocks.

The presentation will focus on the advantages of design-based methods relative to model-based approaches such as hierarchical linear model (HLM) methods that are typically used in social policy research. For example, design-based methods do not require assumptions on the distributions of potential outcomes (only finite moment assumptions), whereas the model-based approaches often assume normality that must hold to produce consistent estimates. Second, design-based approaches produce closed-form expressions for the ATE estimators, unlike HLM methods that require iterative, numerical maximum likelihood procedures for estimation. In addition, for clustered designs, data requirements are fewer for the design-based approach because the analysis can be conducted using data on cluster-level averages rather than individual-level data. Finally, unlike commonly-used model-based approaches, the design-based framework allows for heterogeneity of treatment effects, which leads to variance expressions that differ for the treatment and control groups, and that differ for the FP and SP models.

Association for Public Policy Analysis & Management

Panel Paper: Why Use Design-Based Methods for RCTs?