Publication:

Sequential Discrete Latent Variables for Language Modeling

Loading...
Thumbnail Image

Date

2018-06-29

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Abstract

We introduce a variant of the variational RNN (VRNN) model with discrete latent states to increase interpretability in RNN-based language models. Finding that naively training the model results in the same posterior collapse phenomenon observed in many other autoregressive tasks, we take the special case of an HMM where exact inference is tractable and examine the optimization challenges in that setting. We learn that sampling to compute the optimization objective likely causes optimization of the inference network to be intractable. Since the exact ELBO can be computed in the case of an HMM, we train an inference network for an HMM generative model (without any posterior collapse), then initialize a VRNN using the HMM's parameters and inference network. We find that fine tuning this model and adding non-Markovian transitions between latent time steps lets the model approach an LSTM-based language model's performance, while maintaining a sparse discrete latent state.

Description

Other Available Sources

Research Data

Keywords

Computer Science, Mathematics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories