What Are the Theoretical Differences Between HLM, Rcse, and Design- Based Impact Estimators?

Kautz, Tim; Kautz, Tim

The key feature of design-based theory is that it uses the random assignment process to build the impact estimation model (Imbens and Rubin, 2015; Schochet, 2015). These methods are non-parametric because they do not require assumptions on the distributions of outcomes or model structure. In contrast, commonly used model-based approaches specify an ad hoc model structure (for example, the standard OLS or HLM model) that is assumed to be true to ensure unbiased estimators. But it is not possible to fully verify these model assumptions.

This presentation will summarize results from a paper that formalizes the differences in underlying assumptions across the design-based, HLM, and RCSE approaches, leading to different variance estimators and weights for aggregating clusters and blocks. Understanding these differences can help analysts select an appropriate estmation method in their context.

To illustrate this idea, consider a clustered design where clusters are randomized to treatment or control groups. Under the simplest version of the design-based approach where the data are aggregated to the cluster level, the observed mean outcome, YBAR_jcan be linked to the potential outcomes in the treatment and control conditions, YBAR_Tjand YBAR_Cj, as follows:

(1) YBAR_j = T_{j *}YBAR_Tj+ (1-T_j)_*YBAR_Cj,

where T_jequals 1 for treatment group clusters and 0 otherwise. Rearranging terms yields a regression model where YBAR_j is regressed on T_j, with the “error”, u_j, defined by the randomization process:

(2) YBAR_j = a0 + a1*T_j+ u_j

Estimating this model using weighted least squares yields a differences-in-means estimator that, in large samples, is consistent and normally distributed with a simple variance estimator. Although this approach can weight clusters in several ways, the randomization mechanism aligns closest with the equal weighting of clusters.

In contrast, HLM methods start with assumptions about the data generating process that are not directly linked to the potential outcomes framework. The standard HLM model for clustered RCTs assumes the following relationship:

(3) Y_ij = b0 + b1*T_j+ v_j+ e_ij,

where Y_ij is the outcome of individual i in cluster j, v_j is a cluster random effect, and e_ij is an individual error. Unlike the design-based approach, the errors in (3) are assumed to have a particular structure and distribution. By default, HLM uses precision weighting to weight clusters.

The OLS model with RCSE corrections (Liang and Zeger, 1986) is

(4) Y_ij = c0 + c1*T_j+ e_ij,

where the key assumption is that the errors are uncorrelated across clusters, but may be correlated within clusters in arbitrary ways (unlike HLM). Similar to the design-based methods, the errors are not assumed to follow a particular distribution, but, by default, RCSE methods weight individuals equally for the analysis.

Even when the impact parameters (weighting schemes) are identical for the design- and model-based methods, however, the standard errors differ due to differing estimation approaches. The presentaiton will discuss these issues and extensions, such as to models with covariates and blocked designs. It will also summarize which features of the data generating process would lead to the biggest differences in in estimates between the methods.

Association for Public Policy Analysis & Management

Panel Paper: What Are the Theoretical Differences Between HLM, Rcse, and Design- Based Impact Estimators?