Publication:
Inference for Incomplete Data and Dependent Data

No Thumbnail Available

Date

2018-05-15

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Research Data

Abstract

This thesis is about statistical inference on two classes of data: incomplete data (Chapters 1, 2 and 3) and dependent data (Chapter 4). Chapter 2 relaxes one crucial assumption made in Chapter 1, and then Chapter 3 further generalizes the previous two chapters. The content of each chapter is designed to be self-contained. Chapter 1. Multiple imputation (MI) inference handles missing data by first properly imputing the missing values m times, and then combining the m analysis results from applying a complete-data procedure to each of the completed datasets. However, the existing method for combining likelihood ratio tests has multiple defects: (i) the combined test statistic can be negative in practice when the reference null distribution is a standard F distribution; (ii) it is not invariant to re-parametrization; (iii) it fails to ensure monotonic power due to its use of an inconsistent estimator of the fraction of missing information (FMI) under the alternative hypothesis; and (iv) it requires non-trivial access to the likelihood ratio test statistic as a function of estimated parameters instead of datasets. This chapter shows, via both theoretical derivations and empirical investigations, that essentially all of these problems can be straightforwardly addressed if we are willing to perform an additional likelihood ratio test by stacking the m completed datasets as one big completed dataset. A particularly intriguing finding is that the FMI itself can be estimated consistently by a likelihood ratio statistic for testing whether the m completed datasets produced by MI can be regarded effectively as samples coming from a common model. Practical guidelines are provided based on an extensive comparison of existing MI tests. Chapter 2. Multiple imputation (MI) is a general method to handle missing data. Hypothesis testing given incomplete datasets can be performed by combining likelihood ratio tests given multiply imputed and completed datasets. However, most MI procedures require the assumption of equal fraction of missing information (EFMI), which is unlikely to be satisfied in practice. The performances of the resulting tests are in doubt. Although some existing tests do not require this assumption, they require access to the variance-covariance matrix of the estimator of interested parameters. When the dimension of parameters is high, it is problematic. This chapter proposes a new MI test statistic without relying on the assumption of EFMI, and only requiring users to have access to the complete-data testing procedure. The proposed test makes use of a new technique called Jackknife multiple imputation, which is useful for estimating quantities related to the fraction of missing information. However, the conclusion of this chapter is negative: the proposed test and the standard MI tests (which assume EFMI) have similar performance in terms of size and power. In other words, we provide evidence to support using the MI test proposed in Chapter 1 even if the assumption of EFMI does not hold in practice. Chapter 3. Real datasets usually consist of missing entries. It has been a standard practice to manipulate this type of datasets in three phases: data generation, data imputation, data analysis. Standard statistical procedures with no missing data only concern the first and last phases. If missing data are handled in the second phase, standard procedures may not be valid. Deriving valid procedures are well-known to be difficult. In particular, constructing a consistent variance estimator without interchanging information between imputer and analysts has been an open problem for more than 20 years. In this chapter, we propose a class of imputation methods, called multiple multiple imputation (MMI). It allows the analysts to construct sensible variance estimators. Three examples of MMI are introduced. One of them produces a conservative variance estimator, which is uniformly better than the current state-of-the-art estimator; whereas the other two produce consistent variance estimators. All proposed methods only require the imputer to pass imputed datasets to the analysts as in the standard multiple imputation procedure. However, the analysts are not required to know the imputer’s model; and they only need to know how to perform their own complete data procedures. Hence, MMI procedures have nice statistical and operational properties. Chapter 4. We consider estimation of the asymptotic covariance matrix (ACM) in non-stationary time series. A non-parametric estimator that is robust against unknown forms of trends and possibly divergent number of change points is proposed. Together with a computationally fast optimal bandwidth selector, the estimator of ACM is statistically efficient and easy to implement. The resulting estimator is useful for many statistical procedures, e.g., change point detection and construction of simultaneous confidence bands of trends.

Description

Other Available Sources

Keywords

Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories