Publication:

Contributions to Semiparametric Methods for Incomplete Data

Loading...
Thumbnail Image

Date

2017-05-10

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Evans, Katherine Louise. 2017. Contributions to Semiparametric Methods for Incomplete Data. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Abstract

Abstract Chapter 1: The effect of treatment on the treated (ETT) is a common parameter of interest in causal inference. Identification of ETT typically relies on an assumption of no unobserved confounding. When information on a subset of potential confounders is not observed in a main study, external data from a validation study with more detailed confounding information may be used, under certain assumptions, to help mitigate confounding. In the absence of missing data, a common approach to account for confounding is based on the propensity score. Recently such methods have been extended to address missing confounder data in a main-validation study context; however existing methods rely on overly restrictive assumptions that are unlikely to hold in practice. To address this problem, we develop a novel approach which entails constructing an extended propensity score (EPS) which preserves essential properties of a standard propensity score, but with the additional advantage that it can be evaluated even for subjects with missing confounders. The approach is universal in the sense that it applies to virtually any outcome scale, whether binary, polytomous, or continuous. The finite sample performance of the proposed approach is carefully evaluated and compared to several existing methods in extensive simulation studies. The proposed EPS approach is also illustrated in an application examining the effect of surgical resection on survival time among 14,312 Medicare beneficiaries with malignant neoplasm of the brain using 2,391 patients in SEER-Medicare for the validation study. Abstract Chapter 2: Missing data and confounding are two problems researchers face in observational studies for comparative effectiveness. Williamson et al (2012) recently proposed a unified approach to handle both issues concurrently using a multiply-robust (MR) methodology under the assumption that confounders are missing at random. Their approach considers a union of models in which any submodel has a parametric component while the remaining models are unrestricted. We show that while their estimating function is MR in theory, the possibility for multiply robust inference is complicated by the fact that parametric models for different components of the union model are not variation independent and therefore the MR property is unlikely to hold in practice. To address this, we propose an alternative transparent parametrization of the likelihood function, which makes explicit the model dependencies between various nuisance functions needed to evaluate the MR efficient score. The proposed method is genuinely doubly-robust (DR) in that it is consistent and asymptotic normal if one of two sets of modeling assumptions holds, and we establish that in a sense, this is the best one can achieve in this framework. We evaluate the performance of the DR method via a simulation study. Abstract Chapter 3: This chapter investigates the problem of making inference about a parametric model for the regression of an outcome variable Y on covariates (V,L) when data are fused from two separate sources, one which contains information only on (V,Y) while the other contains information only on covariates (V,L). This data fusion setting may be viewed as an extreme form of missing data in which the probability of observing complete data (V,L,Y) on any given subject is zero. We have developed a large class of semiparametric twin inverse probability weighting (TIPW) estimators, which includes doubly robust (DR) estimators, of the regression coefficients in fused data. The proposed method is DR in that it is consistent and asymptotically normal if, in addition to the model of interest, we correctly specify a model for either the data source process under an ignorability assumption, or the distribution of unobserved covariates. We evaluate the performance of the proposed methodologies via an extensive simulation study.

Description

Other Available Sources

Research Data

Keywords

semiparametric, causal inference, missing data, propensity score, doubly robust

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories