Publication: Contributions to Semiparametric Methods for Incomplete Data
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Abstract Chapter 1: The effect of treatment on the treated (ETT) is a common parameter of interest in causal inference. Identification of ETT typically relies on an assumption of no unobserved confounding. When information on a subset of potential confounders is not observed in a main study, external data from a validation study with more detailed confounding information may be used, under certain assumptions, to help mitigate confounding. In the absence of missing data, a common approach to account for confounding is based on the propensity score. Recently such methods have been extended to address missing confounder data in a main-validation study context; however existing methods rely on overly restrictive assumptions that are unlikely to hold in practice. To address this problem, we develop a novel approach which entails constructing an extended propensity score (EPS) which preserves essential properties of a standard propensity score, but with the additional advantage that it can be evaluated even for subjects with missing confounders. The approach is universal in the sense that it applies to virtually any outcome scale, whether binary, polytomous, or continuous. The finite sample performance of the proposed approach is carefully evaluated and compared to several existing methods in extensive simulation studies. The proposed EPS approach is also illustrated in an application examining the effect of surgical resection on survival time among 14,312 Medicare beneficiaries with malignant neoplasm of the brain using 2,391 patients in SEER-Medicare for the validation study. Abstract Chapter 2: Missing data and confounding are two problems researchers face in observational studies for comparative effectiveness. Williamson et al (2012) recently proposed a unified approach to handle both issues concurrently using a multiply-robust (MR) methodology under the assumption that confounders are missing at random. Their approach considers a union of models in which any submodel has a parametric component while the remaining models are unrestricted. We show that while their estimating function is MR in theory, the possibility for multiply robust inference is complicated by the fact that parametric models for different components of the union model are not variation independent and therefore the MR property is unlikely to hold in practice. To address this, we propose an alternative transparent parametrization of the likelihood function, which makes explicit the model dependencies between various nuisance functions needed to evaluate the MR efficient score. The proposed method is genuinely doubly-robust (DR) in that it is consistent and asymptotic normal if one of two sets of modeling assumptions holds, and we establish that in a sense, this is the best one can achieve in this framework. We evaluate the performance of the DR method via a simulation study. Abstract Chapter 3: This chapter investigates the problem of making inference about a parametric model for the regression of an outcome variable Y on covariates (V,L) when data are fused from two separate sources, one which contains information only on (V,Y) while the other contains information only on covariates (V,L). This data fusion setting may be viewed as an extreme form of missing data in which the probability of observing complete data (V,L,Y) on any given subject is zero. We have developed a large class of semiparametric twin inverse probability weighting (TIPW) estimators, which includes doubly robust (DR) estimators, of the regression coefficients in fused data. The proposed method is DR in that it is consistent and asymptotically normal if, in addition to the model of interest, we correctly specify a model for either the data source process under an ignorability assumption, or the distribution of unobserved covariates. We evaluate the performance of the proposed methodologies via an extensive simulation study.