Publication: Causal Inference Methods for High-Resolution Data: Methodological Innovations for Estimating Treatment Effects in Complex Data Structures
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Research Data
Abstract
This dissertation develops statistical methods for high-resolution data across three distinct but complementary domains. In this context, "high-resolution data" refers to information structures with granular detail that traditional statistical methods often simplify or ignore—whether in the form of temporally dense decision points, precise treatment timing that defines the causal question itself, or detailed within-cluster distributions that standard approaches typically collapse into simple averages.
The first chapter evaluates a sequential risk time sampling algorithm implemented in a mobile health intervention study. This algorithm addresses the challenge of delivering interventions across 144 potential decision points per day while maintaining both treatment frequency and uniform distribution constraints. The analysis demonstrates that the algorithm successfully balances these objectives, enabling valid causal inference for identifying contexts where interventions prove beneficial in chronic disease management.
The second chapter integrates staggered adoption designs with survival analysis to estimate causal effects when treatment timing varies across subjects. The treatment variable in causal inference is problem-defining, and when it has granular temporal structure (like exact transplant dates), it fundamentally reshapes the causal question rather than merely adding predictive power. By combining hazard-based modeling with double machine learning, this approach maintains robustness to model misspecification while providing interpretable treatment effect estimates. Applications to heart transplant data demonstrate superior performance compared to traditional methods, with extensions to business contexts where timing of customer actions has causal implications.
The third chapter introduces a variance estimator for matching methods that remains valid even when matched samples substantially overlap—a common challenge when treatment groups are small. While the granular within-cluster information has always existed in matched datasets, this approach uniquely leverages the full distribution of control outcomes within each matched set, outperforming existing methods in simulation studies and providing a generalized theoretical framework applicable to various matching procedures. The methodological advance enables more reliable inference for policy evaluations across economics, education, and public health.
Together, these three studies advance causal inference methodology by designing statistical approaches that properly leverage high-resolution data structures, providing researchers with practical tools for deriving valid insights from increasingly complex and detailed data while improving both empirical accuracy and decision-making relevance.