Publication:
Robust Semi-Parametric Inference in Semi-Supervised Settings

No Thumbnail Available

Date

2016-05-17

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Chakrabortty, Abhishek. 2016. Robust Semi-Parametric Inference in Semi-Supervised Settings. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Research Data

Abstract

In this dissertation, we consider semi-parametric estimation problems under semi-supervised (SS) settings, wherein the available data consists of a small or moderate sized labeled data (L), and a much larger unlabeled data (U). Such data arises naturally from settings where the outcome, unlike the covariates, is expensive to obtain, a frequent scenario in modern studies involving large electronic databases. It is often of interest in SS settings to investigate if and when U can be exploited to improve estimation efficiency, compared to supervised estimators based on L only. In Chapter 1, we propose a class of Efficient and Adaptive Semi-Supervised Estimators (EASE) for linear regression. These are semi-non-parametric imputation based two-step estimators adaptive to model mis-specification, leading to improved efficiency under model mis-specification, and equal (optimal) efficiency when the linear model holds. This adaptive property is crucial for advocating safe use of U. We provide asymptotic results establishing our claims, followed by simulations and application to real data. In Chapter 2, we provide a unified framework for SS M-estimation problems based on general estimating equations, and propose a family of EASE estimators that are always as efficient as the supervised estimator and more efficient whenever U is actually informative for the parameter of interest. For a subclass of problems, we also provide a flexible semi-non-parametric imputation strategy for constructing EASE. We provide asymptotic results establishing our claims, followed by simulations and application to real data. In Chapter 3, we consider regressing a binary outcome (Y) on some covariates (X) based on a large unlabeled data with observations only for X, and additionally, a surrogate (S) which can predict Y with high accuracy when it assumes extreme values. Assuming Y and S both follow single index models versus X, we show that under sparsity assumptions, we can recover the regression parameter of Y versus X through a least squares LASSO estimator based on the subset of the data restricted to the extreme sets of S with Y imputed using the surrogacy of S. We provide sharp finite sample performance guarantees for our estimator, followed by simulations and application to real data.

Description

Other Available Sources

Keywords

Statistics, Biology, Biostatistics, Mathematics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories