Publication: Bayesian Grammar Induction for Language Modeling
Open/View Files
Date
1995
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Chen, Stanley F. Bayesian Grammar Induction for Language Modeling. Harvard Computer Science Group Technical Report TR-01-95.
Research Data
Abstract
We describe a corpus-based induction algorithm for probabilistic context-free grammars. The algorithm employs a greedy heuristic search within a Bayesian framework, and a post-pass using the Inside-Outside algorithm. We compare the performance of our algorithm to n-gram models and the Inside-Outside algorithm in three language modeling tasks. In two of these domains, our algorithm outperforms these other techniques, marking the first time a grammar-based language model has surpassed n-gram modeling in a task of at least moderate size.
Description
Other Available Sources
Keywords
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service