Paper: Addressing the Multiple Testing Problem In Subgroup Analysis In the Supporting Healthy Marriage Evaluation (2012 APPAM Fall Research Conference (November 8

Friday, November 9, 2012

Liberty A & B (Sheraton Baltimore City Center Hotel)

*Names in bold indicate Presenter

Daniel Gubits¹, Jorgen Harris², Meghan McCormick³ and Amy Lowenstein², (1)Abt Associates, Inc., (2)MDRC, (3)New York University

Social policy experimental evaluations typically collect detailed information that allows for the examination of impacts on multiple outcomes of interest, for the definition of multiple subgroups, and for the examination of impacts within these subgroups. Subgroup analysis is thus prone to a profusion of statistical tests, and therefore often presents the analyst with the multiple testing problem. As the number of statistical tests performed increases, the likelihood of finding a statistically significant impact somewhere among the tested outcomes simply by chance increases far above the desired risk level for producing “false positive” results.

One increasingly popular approach to the multiple testing problem is to categorize statistical tests into “confirmatory” tests and “exploratory” tests. This approach was taken in the subgroup analysis of follow-up data from the Supporting Healthy Marriage (SHM) Evaluation, an experimental evaluation of a relationship education program for low-income married couples with children. Three sets of subgroups, defined by race/ethnicity, family poverty level, and baseline marital distress, were pre-specified as confirmatory. The subgroup analysis (presented in Hsueh et al., 2012) focused on examining whether the intervention was more effective for one group than another within these three sets of subgroups. Even with only three sets of subgroups, tests of differences in impacts on 26 outcomes of interest for three sets of subgroups produced a total of 78 statistical tests. Given the large number of tests, the results presented a challenge for interpretation and made apparent the need for additional information to assist in interpretation. This is a typical situation in subgroup analysis.

In this paper, we will describe three methods to assist in the interpretation of subgroup analysis results. We will present results of these methods when applied to an exploratory subgroup analysis of data from the SHM Evaluation. Six sets of subgroups, defined by length of marriage, presence of young children, abuse in the family of origin, baseline psychological distress, presence of step-children, and family emotional support, will be examined for impacts on 20 survey outcomes with a focus on whether impacts differ across subgroups. Two of the methods will provide information about whether collectively across the 20 outcomes the intervention was more effective for one group than another. The third method will provide additional information for identifying which outcomes have differential impacts across subgroups. The three methods are:

1) a MANCOVA-style test for a linear combination of outcomes that shows significant differences in impacts across subgroups

2) a resampling approach based upon re-randomization that generates probabilities of observing different numbers of significant tests of differences in impacts when true differences are zero for all outcomes

3) a resampling approach based on bootstrapping that provides adjusted p-values for all tests of differences in impacts (analogous to the “step-down bootstrap” in the SAS MULTTEST procedure).

Note to APPAM committee: The authors believe that this paper would fit well on a panel on the multiple testing problem or on a general panel on evaluation methods.

Poster Paper: Addressing the Multiple Testing Problem In Subgroup Analysis In the Supporting Healthy Marriage Evaluation