## Panel Paper: What Are the Theoretical Differences Between HLM, Rcse, and Design- Based Impact Estimators?

Friday, November 3, 2017
Dusable (Hyatt Regency Chicago)

*Names in bold indicate Presenter

Tim Kautz, Mathematica Policy Research

The key feature of design-based theory is that it uses the random assignment process to build the impact estimation model (Imbens and Rubin, 2015; Schochet, 2015). These methods are non-parametric because they do not require assumptions on the distributions of outcomes or model structure. In contrast, commonly used model-based approaches specify an ad hoc model structure (for example, the standard OLS or HLM model) that is assumed to be true to ensure unbiased estimators. But it is not possible to fully verify these model assumptions.

This presentation will summarize results from a paper that formalizes the differences in underlying assumptions across the design-based, HLM, and RCSE approaches, leading to different variance estimators and weights for aggregating clusters and blocks. Understanding these differences can help analysts select an appropriate estmation method in their context.

To illustrate this idea, consider a clustered design where clusters are randomized to treatment or control groups. Under the simplest version of the design-based approach where the data are aggregated to the cluster level, the observed mean outcome, YBARjcan be linked to the potential outcomes in the treatment and control conditions, YBARTj and YBARCj , as follows:

(1) YBARj = Tj *YBARTj + (1-Tj )*YBARCj,

where Tj equals 1 for treatment group clusters and 0 otherwise. Rearranging terms yields a regression model where YBARj is regressed on Tj, with the “error”, uj, defined by the randomization process:

(2) YBARj = a0 + a1*Tj + uj

Estimating this model using weighted least squares yields a differences-in-means estimator that, in large samples, is consistent and normally distributed with a simple variance estimator. Although this approach can weight clusters in several ways, the randomization mechanism aligns closest with the equal weighting of clusters.

In contrast, HLM methods start with assumptions about the data generating process that are not directly linked to the potential outcomes framework. The standard HLM model for clustered RCTs assumes the following relationship:

(3) Yij = b0 + b1*Tj + vj + eij,

where Yij is the outcome of individual i in cluster j, vj is a cluster random effect, and eij is an individual error. Unlike the design-based approach, the errors in (3) are assumed to have a particular structure and distribution. By default, HLM uses precision weighting to weight clusters.

The OLS model with RCSE corrections (Liang and Zeger, 1986) is

(4) Yij = c0 + c1*Tj + eij,

where the key assumption is that the errors are uncorrelated across clusters, but may be correlated within clusters in arbitrary ways (unlike HLM). Similar to the design-based methods, the errors are not assumed to follow a particular distribution, but, by default, RCSE methods weight individuals equally for the analysis.

Even when the impact parameters (weighting schemes) are identical for the design- and model-based methods, however, the standard errors differ due to differing estimation approaches. The presentaiton will discuss these issues and extensions, such as to models with covariates and blocked designs. It will also summarize which features of the data generating process would lead to the biggest differences in in estimates between the methods.