|dc.description.abstract||The three chapters in this dissertation develop new methods within experimental design and causal inference, as well as demonstrate how these two subfields of statistics can inform one another. These three chapters are self-contained, but they are also deeply interconnected in their ideas and approach---we elaborate on these connections in the Foreword.
Chapter 1: A few years ago, the New York Department of Education (NYDE) was planning to conduct an experiment involving five new intervention programs for a selected set of New York City high schools. The goal was to estimate the causal effects of these programs and their interactions on the schools’ performance. For each of the schools, about 50 pre-measured covariates were available. The schools could be randomly assigned to the 32 treatment combinations of this 2^5 factorial experiment, but such an allocation could have resulted in huge covariate imbalance across treatment groups. Standard methods used to prevent confounding of treatment effects with covariate effects (e.g., blocking) were not intuitive due to the large number of covariates. In this chapter, we explore how the recently proposed and studied method of rerandomization can be applied to this problem and other factorial experiments. We propose how to implement rerandomization in factorial experiments, extend the theoretical properties of rerandomization from single-factor experiments to 2^K factorial designs, and demonstrate, using the NYDE data, how such a designed experiment can improve precision of estimated factorial effects.
Chapter 2: A benefit of randomized experiments is that covariate distributions of treatment and control groups are balanced on average, resulting in simple unbiased estimators for treatment effects. However, it is possible that a particular randomization yields covariate imbalances that researchers want to address in the analysis stage through adjustment or other methods. Here we present a randomization test that conditions on covariate balance by only considering treatment assignments that are similar to the observed one in terms of covariate balance. Previous conditional randomization tests have only allowed for categorical covariates, while our randomization test allows for any type of covariate. Through extensive simulation studies, we find that our conditional randomization test is more powerful than unconditional randomization tests and other conditional tests. Furthermore, we find that our conditional randomization test is valid (1) unconditionally across levels of covariate balance, and (2) conditional on particular levels of covariate balance. Meanwhile, unconditional randomization tests are valid for (1) but not (2). Finally, we find that our conditional randomization test is similar to a randomization test that uses a model-adjusted test statistic.
Chapter 3: Causal analyses for observational studies are often complicated by covariate imbalances among treatment groups, and matching methodologies alleviate this complication by finding subsets of treatment groups that exhibit covariate balance. However, standard methods for conducting causal analyses after matching assume completely randomized, blocked, or paired designs, which do not fully condition on the balance that modern matching algorithms provide. We find that, as a result, standard approaches can be unnecessarily conservative. We develop an alternative approach that more directly leverages covariate balance in the design and analysis stages of observational studies. In the design stage, we propose a randomization test that---unlike common balance diagnostics in the literature---is a valid test for the ubiquitous assumption that a matched dataset approximates an experimental design. Thus, our test can be used to validly determine a plausible design for a matched dataset. Then, in the analysis stage, we provide a treatment effect estimation strategy that uses the design deemed plausible by our test. Through simulation and a real application in political science, we find that our approach yields more precise causal analyses than standard approaches by conditioning on the high levels of balance in matched datasets that researchers---often using subject-matter expertise---instigate by design.||