Publication: Statistical Methods for Data With Latent Structures
No Thumbnail Available
Date
2018-05-11
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Research Data
Abstract
This dissertation develops statistical methods to study and utilize the latent structure of data. Here, the latent structure of our interest include but are not limited to latent heterogeneity of rank data, latent seasonal components of univariate time series data, as well as latent factors and change-points of multivariate time series data. We build all the models from a Bayesian perspective, and develop different types of statistical inferences tailed for different motivations and purposes of real data applications. This dissertation contains three self-contained chapters.
Chapter 1 studies rank aggregation problem with covariates and heterogeneous rankers. We propose the Bayesian Aggregation of Rank-data with Covariates (BARC) and its extensions not only to obtain a complete aggregated ranking list, but also to study individual reliability and overall consistency of rankers. In specific, the two ex- tensions consider varying qualities and heterogeneous ranking opinions of rankers, respectively. We developed efficient full Bayesian inference via parameter-expanded Gibbs sampler. Simulation studies show the superior performance of our methods to other existing methods in a variety of scenarios. We finally exploit our proposed method to solve real-data problems in sports and medical studies.
Chapter 2 studies the forecasting of unemployment initial claims with the help of Internet search data. We presents a novel statistical method, Penalized Regression with Inferred Seasonality Module (PRISM) to better forecast (including nowcast) unemployment initial claims weeks into future. Our method PRISM is semi-parametric, as it collectively considers a wide range of parametric time series models. We introduce a general state space formulation that contains a variety of widely used time series models as special cases, and a joint model with Internet search data to put all contemporaneous time series into a same system. We then derive a universal predictive model for forecasting initial claim data from our general formulation, and develop a two-stage estimation procedure using nonparametric seasonal decomposition and L1 penalized regression. PRISM outperforms all alternatives in out-of-sample testing.
Chapter 3 introduces a Bayesian factor model with multiple change-points in the quest for estimating time-varying covariance of high-dimensional time series. Under the high-dimensional setting, we exploit spike-and-slab LASSO prior on factor loadings such that the estimated factor loading matrix is sparse and interpretable. On top of factor model, we consider piecewise stationary distributions for the factors to accommodate the change over time. We then proposed an efficient EM algorithm to estimate posterior mode of our proposed model by taking advantage of L1 regularized regression and algorithms for exact change-point detection. The number of factors and the number of change-points are considered unknown and inferred coherently from ob- served data and our model specification. In the application to real data examples, our method delivers highly interpretable latent factor and meaningful change-points.
Description
Other Available Sources
Keywords
Statistics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service