Statistical Inference for Causal Mechanisms: Mediation and Interference
FULCHER-DISSERTATION-2019.pdf (8.135Mb)(embargoed until: 2021-03-01)
Fulcher, Isabel Rose
MetadataShow full item record
CitationFulcher, Isabel Rose. 2019. Statistical Inference for Causal Mechanisms: Mediation and Interference. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
AbstractRandomized clinical trials have been instrumental for identifying causal effects leading to life-saving treatments for patients, improved protocols in health systems, and effective social and educational programs for underprivileged populations. However, the standard randomized experiment is often impractical, unethical, or insufficient such that observational studies or more complex experimental designs must be used. When randomization is not possible, the causal effect of an exposure may be clouded by a web of confounding variables. Even in cases when treatment assignment is under investigator control, randomization may be insufficient to control for sources of confounding between post-treatment variables or social interactions between individuals.
The existing causal inference literature provides a formal framework for inferring causation in complex or non-randomized settings; however, this often requires stringent assumptions, such as (1) all common causes of the exposure and outcome variables are measured and (2) exposures received by one person must not be related to another person's outcome. These two assumptions are rarely both met due to the interplay of social, economic, and geographical determinants of health present in most observational research settings. This dissertation provides formal approaches to elucidate causal mechanisms for several research queries in the presence of unobserved confounding or interference between individuals.
The first half of the dissertation focuses on causal mediation analysis in the presence of unobserved confounding. Causal mediation analysis seeks to explain the underlying relationship between an exposure and outcome through an intermediate variable. That is, beyond evaluating the total effect of the exposure on outcome, one aims to evaluate the indirect effect of the exposure on outcome through a given mediator and the direct effect of the exposure on the outcome, not through the mediator. Natural (pure) direct and indirect effects have emerged as the most common form of causal effects in modern mediation analysis. Recent advances in this area have established formal conditions for identification and estimation of natural direct and indirect effects. However, these conditions involve stringent no unmeasured confounding assumptions and that the mediator has been measured without error. These assumptions may fail to hold in practice where mediation methods are often applied.
In Chapter 1, we demonstrate that the natural indirect effect can in fact be identified under less stringent conditions than previously thought. Specifically, we establish that the natural indirect effect is identified in the presence of unmeasured exposure-outcome confounding provided there is no additive interaction between the mediator and unmeasured confounder(s). Further, we present a new estimator for the natural indirect effect that is robust to both unmeasured exposure-outcome and mediator-outcome confounding and classical measurement error in the mediator. This result is particularly relevant to randomized studies where the mediator-outcome relationship is the only association that is subject to potential confounding bias. In Chapter 2, we introduce a novel form of indirect effect, which we call the population intervention indirect effect (PIIE). This new type of indirect effect captures the extent to which the effect of exposure is mediated by an intermediate variable under an intervention which fixes the component of exposure directly influencing the outcome at its observed value. Thus, the PIIE is attractive for settings with a harmful exposure where one may not be interested in an intervention that forces a person to be exposed. In addition, the PIIE is empirically identified whether or not exposure-outcome unmeasured confounding exists, and unlike the natural indirect effect, no additional assumption is needed. Interestingly, our identification criterion relaxes Judea Pearl's front-door criterion as it does not require no direct effect of exposure not mediated by the intermediate variable. For estimation of the PIIE, we provide parametric and semiparametric estimators, including a doubly robust semiparametric locally efficient estimator, that perform very well in simulation studies.
The latter half of the dissertation is concerned with the evaluation of causal effects on a network of interconnected units. In a network setting, the outcome of one unit may affect the outcome of another unit in the network, even if the units are not directly connected. Furthermore, the outcome of one unit may also be influenced by the exposure of another unit -- a phenomenon known formally as interference. In the presence of interference, causal inference is rendered significantly more complex due to non-trivial dependence between units. In spite of this challenge, the presence of interference also gives rise to new causal estimands of interest known as direct, spillover, and overall effects. Much of the prior literature on interference requires that units can be partitioned into non-overlapping groups or clusters such that the outcomes of units in separate groups are independent, an assumption known as partial interference. This assumption may be appropriate for well-defined and distinct clusters, such as geographically distant villages, schools, or hospitals. However, the partial interference assumption will not hold when units may have multiple, overlapping relationships that cannot be reasonably partitioned.
In Chapter 3, we describe the auto-g-computation approach for evaluating causal effects of interventions on a network of interconnected units in the presence of long range outcome dependence between two units and arbitrary forms of interference. The proposed method places no a priori restrictions on the network structure but requires that the outcome, treatment, and covariates can be viewed as single realizations of a conditional Markov random field induced by a certain chain graph defined on the network. The proposed approach to inference relies on positing certain parametric auto-models first proposed by Besag which are estimated using coding maximum likelihood approach. Unfortunately, the coding estimator can be inefficient. In Chapter 4, we develop a Bayesian auto-g-computation algorithm which recovers information not used by auto-g-computation by incorporating data on the entire network. Through simulations, we demonstrate that the proposed Bayesian estimator is substantially more efficient than the existing auto-g-computation estimator. Bayesian auto-g-computation is used to evaluate the effect of prior incarceration on HIV, STI, and Hepatitis C prevalence on a sexual and injection drug use network.
Taken together, the developments from this dissertation expand the toolbox for estimation causal effects from observational studies and network data. Importantly, this body of statistical theory has been implemented in multiple readily usable, open source software packages in the R and C++ programming languages. In addition to the applications discussed herein, the range of methods and tools will be primed to address additional scientific queries in medicine and social science.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:41121296
- FAS Theses and Dissertations