Bayesian Grammar Induction for Language Modeling
CitationChen, Stanley F. Bayesian Grammar Induction for Language Modeling. Harvard Computer Science Group Technical Report TR-01-95.
AbstractWe describe a corpus-based induction algorithm for probabilistic context-free grammars. The algorithm employs a greedy heuristic search within a Bayesian framework, and a post-pass using the Inside-Outside algorithm. We compare the performance of our algorithm to n-gram models and the Inside-Outside algorithm in three language modeling tasks. In two of these domains, our algorithm outperforms these other techniques, marking the first time a grammar-based language model has surpassed n-gram modeling in a task of at least moderate size.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:23017264
- FAS Scholarly Articles