Publication:
Covariate-Dependent Nonparametric Mixture Models

No Thumbnail Available

Date

2016-06-22

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Research Data

Abstract

This work investigates the problem of learning nonparametric mixture models in the presence of external covariates of interest. This modeling problem is relevant whenever one wishes to explore the underlying structure of data, under the hypothesis that said structure is dependent upon certain exogenous factors and may grow progressively more complex as more observations are realized, such as may be true for large text corpora. I present a general modeling framework based on the use of dependent Dirichlet process priors, and discuss the associated inferential issues. I then develop Covariate-Augmented Nonparametric Latent Dirichlet Allocation (C-LDA), a novel model of this class in which dependencies on arbitrary external covariates are allowed to be present in a very general and flexible way. I introduce both Markov Chain Monte Carlo (MCMC) and variational inference procedures for the model. Finally, I apply C-LDA to a few diverse problems, which include modeling a corpus of opinion articles from the New York Times and performing haplotype phasing on familial genome sequences. I show that the estimates of the model parameters can be used to construct effective summary statistics that quantify and describe the relationships between the covariates of interest and the latent structure of the data. This underscores the significance of latent variable models as a powerful tool to understand complex or unstructured data of a variety of forms in the social and natural sciences.

Description

Other Available Sources

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories