Publication: Deep Latent Variable Models of Natural Language
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Understanding natural language involves complex underlying processes by which meaning is extracted from surface form. One approach to operationalizing such phenomena in computational models of natural language is through probabilistic latent variable models, which can encode structural dependencies among observed and unobserved variables of interest within a probabilistic framework. Deep learning, on the other hand, offers an alternative computational approach to modeling natural language through end-to-end learning of expressive, global models, where any phenomena necessary for the task are captured implicitly within the hidden layers of a neural network. This thesis explores a synthesis of deep learning and latent variable modeling for natural language processing applications. We study a class of models called deep latent variable models, which parameterize components of probabilistic latent variable models with neural networks, thereby retaining the modularity of latent variable models while at the same time exploiting rich parameterizations enabled by recent advances in deep learning. We experiment with different families of deep latent variable models to target a wide range of language phenomena---from word alignment to parse trees---and apply them to core natural language processing tasks including language modeling, machine translation, and unsupervised parsing.
We also investigate key challenges in learning and inference that arise when working with deep latent variable models for language applications. A standard approach for learning such models is through amortized variational inference, in which a global inference network is trained to perform approximate posterior inference over the latent variables. However, a straightforward application of amortized variational inference is often insufficient for many applications of interest, and we consider several extensions to the standard approach that lead to improved learning and inference. In summary, each chapter presents a deep latent variable model tailored for modeling a particular aspect of language, and develops an extension of amortized variational inference for addressing the particular challenges brought on by the latent variable model being considered. We anticipate that these techniques will be broadly applicable to other domains of interest.