Publication:
Diagnostic Tools in Missing Data and Causal Inference on Time Series

No Thumbnail Available

Date

2018-05-16

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Research Data

Abstract

This thesis is divided into two self-contained parts. The first part focuses on diagnostic tools for missing data. Models for analyzing multivariate data sets with missing values require strong, often unassessable, assumptions. The most common of these is that the mechanism that created the missing data is ignorable - a twofold assumption dependent on the mode of inference. The first part, which is the focus here, under the Bayesian and direct-likelihood paradigms, requires that the missing data are missing at random (MAR); in contrast, the frequentist-likelihood paradigm demands that the missing data mechanism always produces MAR data, a condition known as missing always at random (MAAR). Under certain regularity conditions, we show that assuming MAAR leads to a series of results that can be tested using the observed data alone. Using our new results, we provide theoretical justifications for some existing diagnostic procedures and propose three new methods that not only indicate when MAAR is incorrect, but also suggest which variables are the most likely culprits. Although MAAR is not a necessary condition to ensure validity under the Bayesian and direct-likelihood paradigms, it is sufficient, and evidence for its violation should encourage the careful statistician to conduct a targeted sensitivity analysis. The second part of this thesis focuses on causal inference from time series. We define causal estimands for experiments on single time series, extending the potential outcome framework to dealing with temporal data. Our approach allows the estimation of a broad class of these estimands and exact randomization based p-values for testing causal effects, without imposing stringent assumptions. We further derive a general central limit theorem for conducting conservative tests and building confidence intervals for causal effects. Finally, we provide three methods for generalizing our approach to multiple units that are receiving the same class of treatment, over time. We test our methodology on simulated ``potential autoregressions,'' which have a causal interpretation. Our methodology is partially inspired by data from many experiments carried out by a financial company who compared the impact of two different ways of trading equity futures contracts. We use our methodology to make causal statements about their trading methods.

Description

Other Available Sources

Keywords

Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories