Bayesian Methods and Computation for Large Observational Datasets

DSpace/Manakin Repository

Bayesian Methods and Computation for Large Observational Datasets

Citable link to this page


Title: Bayesian Methods and Computation for Large Observational Datasets
Author: Watts, Krista Leigh
Citation: Watts, Krista Leigh. 2013. Bayesian Methods and Computation for Large Observational Datasets. Doctoral dissertation, Harvard University.
Full Text & Related Files:
Abstract: Much health related research depends heavily on the analysis of a rapidly expanding universe of observational data. A challenge in analysis of such data is the lack of sound statistical methods and tools that can address multiple facets of estimating treatment or exposure effects in observational studies with a large number of covariates. We sought to advance methods to improve analysis of large observational datasets with an end goal of understanding the effect of treatments or exposures on health. First we compared existing methods for propensity score (PS) adjustment, specifically Bayesian propensity scores. This concept had previously been introduced (McCandless et al., 2009) but no rigorous evaluation had been done to evaluate the impact of feedback when fitting the joint likelihood for both the PS and outcome models. We determined that unless specific steps were taken to mitigate the impact of feedback, it has the potential to distort estimates of the treatment effect. Next, we developed a method for accounting for uncertainty in confounding adjustment in the context of multiple exposures. Our method allows us to select confounders based on their association with the joint exposure and the outcome while also accounting for the uncertainty in the confounding adjustment. Finally, we developed two methods to combine het- erogenous sources of data for effect estimation, specifically information coming from a primary data source that provides information for treatments, outcomes, and a limited set of measured confounders on a large number of people and smaller supplementary data sources containing a much richer set of covariates. Our methods avoid the need to specify the full joint distribution of all covariates.
Terms of Use: This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at
Citable link to this page:
Downloads of this work:

Show full Dublin Core record

This item appears in the following Collection(s)


Search DASH

Advanced Search