Training Restricted Boltzmann Machines on Word Observations

DSpace/Manakin Repository

Training Restricted Boltzmann Machines on Word Observations

Citable link to this page


Title: Training Restricted Boltzmann Machines on Word Observations
Author: Dahl, George E.; Adams, Ryan Prescott; Larochelle, Hugo

Note: Order does not necessarily reflect citation order of authors.

Citation: Dahl, George E., Ryan Prescott Adams, and Hugo Larochelle. 2012. Training restricted Boltzmann machines on word observations. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, June 26 – July 1, 2012, ed. John Langford and Joelle Pineau, 679-686. Edinburgh: International Machine Learning Society.
Full Text & Related Files:
Abstract: The restricted Boltzmann machine (RBM) is a flexible tool for modeling complex data, however there have been significant computational difficulties in using RBMs to model high-dimensional multinomial observations. In natural language processing applications, words are naturally modeled by K-ary discrete distributions, where K is determined by the vocabulary size and can easily be in the hundreds of thousands. The conventional approach to training RBMs on word observations is limited because it requires sampling the states of K-way softmax visible units during block Gibbs updates, an operation that takes time linear in K. In this work, we address this issue by employing a more general class of Markov chain Monte Carlo operators on the visible units, yielding updates with computational complexity independent of K. We demonstrate the success of our approach by training RBMs on hundreds of millions of word n-grams using larger vocabularies than previously feasible and using the learned features to improve performance on chunking and sentiment classification tasks, achieving state-of-the-art results on the latter.
Published Version:
Other Sources:
Terms of Use: This article is made available under the terms and conditions applicable to Open Access Policy Articles, as set forth at
Citable link to this page:
Downloads of this work:

Show full Dublin Core record

This item appears in the following Collection(s)


Search DASH

Advanced Search