Cluster-based outcome-dependent sampling: inference and frameworks for efficient sampling designs

Sauer, Sara

View/Open

Dissertation_SaraSauer_upd3.pdf (4.040Mb)

Author

Sauer, Sara

Metadata

Show full item record

Citation

Sauer, Sara. 2021. Cluster-based outcome-dependent sampling: inference and frameworks for efficient sampling designs. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Efficient sampling designs are valuable in public health research when finite resources necessitate decisions regarding which individuals to sample for detailed data collection. In observational studies, when the outcome is rare, outcome-dependent sampling (ODS) is a cost-efficient strategy that leverages information on the subject outcomes at the design stage to inflate the outcome rate in the sample and thereby increase statistical efficiency. In many settings, the individuals in the target population are clustered, as are patients in health centers, and therefore exhibit cluster-correlation in their outcomes. Logistical, ethical, or resource constraints may require sampling clusters rather than individuals directly. In such settings, the question becomes which \textit{clusters} should be sampled to yield the most `informative' sample for the research question of interest.

This dissertation focuses on the design and analysis of cluster-based ODS designs, in which cluster-level summaries of the outcome, as well as possibly other pieces of cluster-level, readily-available information from sources such as a country's Health Management Information System (HMIS), is used to guide the decision regarding which clusters to sample. In particular, this dissertation proposes methods for i) valid estimation and inference given data collected through a cluster-based ODS design when the number of sampled clusters is small, and ii) a framework for designing efficient cluster-based ODS designs, when interest lies in estimating with precision one or multiple parameters in a marginal mean model.

In Chapter 1, I propose to carry out inference given data collected through a cluster-based ODS scheme using inverse-probability-weighted generalized estimating equations (IPW-GEE), where the cluster-specific weights are the inverse of a cluster's probability of selection into the sample. I provide a detailed treatment of the asymptotic properties of this estimator, together with an explicit expression for the asymptotic variance and a corresponding estimator. Furthermore, motivated by a study on risk factors for low birthweight in Rwanda, I propose a number of small-sample bias corrections to the point estimates and standard error estimates. In Chapter 2, I develop an approach for optimal allocation in single-stage stratified cluster-based ODS designs and investigate the potential for gains in statistical efficiency under such a design given one or multiple parameters of interest. As the optimal allocation formulae presented in Chapter 2 depend on quantities that are unknown in practice, Chapter 3 proposes and evaluates an adaptive sampling strategy for operationalizing the optimal allocation design in practice. Finally, in Chapter 4 I give concluding remarks and present some directions for future work.

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA

Citable link to this page

https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37368479

Collections

FAS Theses and Dissertations [6136]

Contact administrator regarding this item (to report mistakes or request changes)