Methods for Effectively Combining Group- and Individual-Level Data
MetadataShow full item record
CitationSmoot, Elizabeth. 2015. Methods for Effectively Combining Group- and Individual-Level Data. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
AbstractIn observational studies researchers often have access to multiple sources of information but ultimately choose to apply well-established statistical methods that do not take advantage of the full range of information available. In this dissertation I discuss three methods that are able to incorporate this additional data and show how using each improves the quality of the analysis.
First, in Chapters 1 and 2, I focus on methods for improving estimator efficiency in studies in which both population (group) and individual-level data is available. In such settings, the hybrid design for ecological inference efficiently combines the two sources of information; however, in practice, maximizing the likelihood is often computationally intractable. I propose and develop an alternative, computationally efficient representation of the hybrid likelihood. I then demonstrate that this approximation incurs no penalty in terms of increased bias or reduced efficiency.
Second, in Chapters 3 and 4, I highlight the problem of applying standard analyses to outcome-dependent sampling schemes in settings in which study units are cluster-correlated. I demonstrate that incorporating known outcome totals into the likelihood via inverse probability weights results in valid estimation and inference. I further discuss the applicability of outcome-dependent sampling schemes in resource-limited settings, specifically to the analysis of national ART programs in sub-Saharan Africa. I propose the cluster-stratified case-control study as a valid and logistically reasonable study design in such resource-poor settings, discuss balanced versus unbalanced sampling techniques, and address the practical trade-off between logistic considerations and statistical efficiency of cluster-stratified case-control versus case-control studies.
Finally, in Chapter 5, I demonstrate the benefit of incorporating the full-range of possible outcomes into an observational data analysis, as opposed to running the analysis on a pre-selected set of outcomes. Testing all possible outcomes for associations with the exposure inherently incorporates negative controls into the analysis and further validates a study's statistically significant results. I apply this technique to an investigation of the relationship between particulate air pollution and hospital admission causes.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:17463969
- FAS Theses and Dissertations