Publication: Covariate-Dependent Nonparametric Mixture Models
No Thumbnail Available
Date
2016-06-22
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Research Data
Abstract
This work investigates the problem of learning nonparametric mixture models in the presence of external covariates of interest. This modeling problem is relevant whenever one wishes to explore the underlying structure of data, under the hypothesis that said structure is dependent upon certain exogenous factors and may grow progressively more complex as more observations are realized, such as may be true for large text corpora. I present a general modeling framework based on the use of dependent Dirichlet process priors, and discuss the associated inferential issues. I then develop Covariate-Augmented Nonparametric Latent Dirichlet Allocation (C-LDA), a novel model of this class in which dependencies on arbitrary external covariates are allowed to be present in a very general and flexible way. I introduce both Markov Chain Monte Carlo (MCMC) and variational inference procedures for the model. Finally, I apply C-LDA to a few diverse problems, which include modeling a corpus of opinion articles from the New York Times and performing haplotype phasing on familial genome sequences. I show that the estimates of the model parameters can be used to construct effective summary statistics that quantify and describe the relationships between the covariates of interest and the latent structure of the data. This underscores the significance of latent variable models as a powerful tool to understand complex or unstructured data of a variety of forms in the social and natural sciences.
Description
Other Available Sources
Keywords
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service