Essays on Complex Discontinuity Designs and Inference With Missing Data
Access StatusFull text of the requested work is not available in DASH at this time ("dark deposit"). For more information on dark deposits, see our FAQ.
Diaz, Juan D.
MetadataShow full item record
CitationDiaz, Juan D. 2019. Essays on Complex Discontinuity Designs and Inference With Missing Data. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
AbstractThis thesis consists of three chapters; two on discontinuity designs for causal inference, and one chapter on inference with missing data. Each chapter is self-contained.
Chapter 1. Regression discontinuity designs are extensively used for causal inference in observational studies. However, they are usually confined to settings with simple treatment rules, determined by a single running variable with a single cutoff. In this chapter, we propose a new framework for general discontinuity designs. This framework incorporates more complex treatment rules that may be determined by multiple running variables, each with many cutoffs, and that possibly lead to the same treatment. In this framework, the running variables may be discrete. Moreover, the treatments do not need to be binary. In this framework, the observed covariates play a central role for identification, estimation, and generalization of causal effects. Identification essentially relies on a local unconfoundedness assumption. Estimation proceeds as in any observational study under the strong ignorability assumption, yet in a neighborhood of the cutoffs of the running variables. We discuss estimation approaches based on matching and weighting, including additional regression adjustments in doubly robust estimators. We consider assumptions for generalization; that is, for identification and estimation of average treatment effects for target populations beyond the study sample that resides in a neighborhood of the cutoffs. We also examine a new approach to select the neighborhood for the analyses and assess the plausibility of the assumptions. We argue that, in a sense, traditional continuity and local randomization frameworks for regression discontinuity designs are particular cases of our proposed framework. We motivate this new framework by a case study of the impact of grade retention on juvenile crime, where we use a unique and new administrative data set with extensive educational and criminal records of the same students, observed for 15 years. In this case study, we find no effect of grade retention on the probability of criminal activities during youth.
Chapter 2. Regression discontinuity designs (RDDs) often use a coarsened version of an underlying continuous running variable, e.g., income coarsened in quantiles, age coarsened in years, or birth weight coarsened in kilograms. If the running variable is coarsened, the traditional approach to RDDs leads to inconsistent treatment effect estimates. Motivated by a case study of the impact of free college tuition on college enrollment and dropout rates in Chile, we propose a new approach to causal inference in RDDs with coarsened running variables. This approach relies on a Local Unconfoundedness Assumption, where observed covariates other than the running variable are crucial for identification and estimation in a neighborhood of the cutoff. This approach facilitates the analyses in presence of intermediate outcome variables, an issue of primary interest in our application. We also discuss the selection of the neighborhood of the cutoff for the analyses. Utilizing a new and unique administrative data set, we estimate the effect of free college tuition on college enrollment and dropping out. We exploit the RDD generated by the fact that students from the 50 percent of the poorest households are eligible to receive the benefit of free college tuition. Unfortunately, we only observe the decile membership of each student's household income, a coarsened version of household income. Moreover, two complications are present in our case study: noncompliance in the receipt of the benefit among eligible students and truncation of the dropout indicator by non-enrollment. Therefore, the effect on dropout rates is meaningful only for that principal stratum defined by students who would enroll in college regardless of whether they are eligible or not. Our analysis reveals that, although there is evidence of a positive effect of the free-tuition policy on enrollment, the impact on dropout rates for the always-enrolled compliers is null.
Chapter 3. In this chapter, we present a new nonparametric imputation method to estimate the marginal mean of an outcome variable with missing values, when missingness is governed by a set of fully observed covariates. For each unit with missing information, the imputed value is a weighted average of observed outcomes that balance observed covariates, where the weights are determined by solving a linear programming problem that minimizes the distance between a weighted average of the covariate values of the units with complete data and the covariate value of the unit that has missing data. Under the assumption that data are missing at random (MAR) and suitable conditions, we show that our estimator for the outcome mean is consistent at the square-root N rate. Moreover, we also present an asymptotic normality result for this estimator and provide a consistent estimator for its variance. Finite sample properties of our proposal are investigated through Monte Carlo simulations. We implement this method in a new statistical software in Stata.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:42013138
- FAS Theses and Dissertations