Measuring Consequential Covariate Imbalance
*Names in bold indicate Presenter
The particular case of propensity score matching has prompted reconsideration of the traditional approach of statistical testing to assess balance between treatment and control groups (Austin, 2009b). For example, heuristics for standardized differences (e.g., larger than 0.1 or 0.25) have been proposed as more useful assessments of balance on covariates instead of traditional significance tests which are sample size dependent. Furthermore, some have focused on using information about the prognostic value of a covariate in predicting the outcome for interpreting the balance on that covariate across treatments (Ho et al., 2007). The intuitive appeal of incorporating information on the relationship between a covariate and an outcome is that it incorporates information about how consequential differences between groups are for estimating a treatment effect.
In keeping with the theme of this year's conference, we operationalize this intuition by developing a measure of balance for a covariate between treatment groups that incorporates information about the relationship between the covariate and the outcome. In particular, we utilize Frank’s (2000) measure of the impact of a covariate, which is defined as the product of the correlation between a covariate and treatment assignment and the correlation between the covariate and the outcome. We demonstrate that the relationship between expected impact and sample size in a theoretical randomized experiment is highly non-linear. This in effect provides a natural threshold through which to interpret whether a covariate is balanced across groups. The end result is a covariate-specific threshold related to the amount of bias that can be introduced on account of imbalance on any particular covariate. A covariate is imbalanced if it exceeds the calculated threshold.
We illustrate the application of our approach to an example of propensity score adjustment used to estimate the effect of kindergarten retention on achievement (Hong and Raudenbush, 2005). We also examine the sensitivity to key assumptions of our approach, discuss other potential applications and implications for causal inference in public policy and management, and present Stata and R routines for calculating the threshold.