Publication:

Causal Inference Methods for High-Resolution Data: Methodological Innovations for Estimating Treatment Effects in Complex Data Structures

Loading...
Thumbnail Image

Date

2025-05-15

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Meng, Xiang. 2025. Causal Inference Methods for High-Resolution Data: Methodological Innovations for Estimating Treatment Effects in Complex Data Structures. Doctoral Dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

This dissertation develops statistical methods for high-resolution data across three distinct but complementary domains. In this context, "high-resolution data" refers to information structures with granular detail that traditional statistical methods often simplify or ignore—whether in the form of temporally dense decision points, precise treatment timing that defines the causal question itself, or detailed within-cluster distributions that standard approaches typically collapse into simple averages.

The first chapter evaluates a sequential risk time sampling algorithm implemented in a mobile health intervention study. This algorithm addresses the challenge of delivering interventions across 144 potential decision points per day while maintaining both treatment frequency and uniform distribution constraints. The analysis demonstrates that the algorithm successfully balances these objectives, enabling valid causal inference for identifying contexts where interventions prove beneficial in chronic disease management.

The second chapter integrates staggered adoption designs with survival analysis to estimate causal effects when treatment timing varies across subjects. The treatment variable in causal inference is problem-defining, and when it has granular temporal structure (like exact transplant dates), it fundamentally reshapes the causal question rather than merely adding predictive power. By combining hazard-based modeling with double machine learning, this approach maintains robustness to model misspecification while providing interpretable treatment effect estimates. Applications to heart transplant data demonstrate superior performance compared to traditional methods, with extensions to business contexts where timing of customer actions has causal implications.

The third chapter introduces a variance estimator for matching methods that remains valid even when matched samples substantially overlap—a common challenge when treatment groups are small. While the granular within-cluster information has always existed in matched datasets, this approach uniquely leverages the full distribution of control outcomes within each matched set, outperforming existing methods in simulation studies and providing a generalized theoretical framework applicable to various matching procedures. The methodological advance enables more reliable inference for policy evaluations across economics, education, and public health.

Together, these three studies advance causal inference methodology by designing statistical approaches that properly leverage high-resolution data structures, providing researchers with practical tools for deriving valid insights from increasingly complex and detailed data while improving both empirical accuracy and decision-making relevance.

Description

Other Available Sources

Keywords

Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories