Publication:
Inferring Conditional Dependencies in Observational Studies: Nuisance Function Tuning and Transfer Learning

No Thumbnail Available

Date

2024-05-31

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

McGrath, Sean. 2024. Inferring Conditional Dependencies in Observational Studies: Nuisance Function Tuning and Transfer Learning. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

Conditional dependence estimation problems are ubiquitous in the statistical inference literature. Two questions that arise in such estimation problems are (i) how to estimate nuisance functions for optimal downstream inference on the target parameter and (ii) how to integrate data from heterogeneous populations to improve inference in a given target population. This work takes steps in addressing these two questions in Parts 1 and 2 respectively. In Part 1, we consider the problem of how to estimate nuisance functions to obtain optimal rates of convergence for a doubly robust nonparametric functional that has witnessed applications across the causal inference and conditional independence testing literature. We illustrate the interplay between different types of estimators, tuning parameter choices for the nuisance functions, and sample splitting strategies on the optimal rate of estimating the functional of interest. We split Part 1 into Chapters 1 and 2. In Chapter 1, we analyze four estimators -- three plug-in estimators and the first-order estimator -- each under two different sample splitting strategies, all under the assumption that the covariate density is known. We show the necessity to undersmooth the nuisance function estimators under low regularity conditions to obtain optimal rates of convergence for the functional of interest. By performing suitable nuisance function tuning and sample splitting strategies, we show that some of these estimators can achieve minimax rates of convergence in all Hölder smoothness classes of the nuisance functions. In Chapter 2, we extend the results in Chapter 1 for the case where the covariate density is unknown. This case presents several subtleties in terms of information-theoretic limits of the problem, the nature of undersmoothing and sample splitting, and the choice of estimators of nuisance functions. We consider a sample splitting strategy where all the nuisance functions are estimated in separate subsamples. For each of the four types of estimators considered in Chapter 1 modified for the unknown density case, we illustrate the necessity to undersmooth the nuisance function estimators in the non-root-n regime to obtain optimal rates of convergence for the functional of interest. In Part 2 (Chapter 3), we consider the problem of leveraging data from a source population to improve estimation of a low rank matrix in an underrepresented target population. One such example is estimating the associations between genetic variants and diseases in non-European ancestry groups. We propose an approach that leverages similarity in the latent row and column spaces between the source and target populations to improve estimation in the target population, which we refer to as LatEnt spAce-based tRaNsfer lEaRning (LEARNER). We conducted a simulation study which found that LEARNER often outperforms benchmark approaches that only use the target population data, especially as the sample size from the source population increases. We also performed an illustrative application and empirical comparison of LEARNER and benchmark approaches in a re-analysis of a genome-wide association study in the BioBank Japan cohort.

Description

Other Available Sources

Keywords

doubly robust estimation, latent space, nonparametric functional estimation, nuisance function tuning, sample splitting, transfer learning, Biostatistics, Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories