Publication:
Bayesian Grammar Induction for Language Modeling

Thumbnail Image

Date

1995

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Chen, Stanley F. Bayesian Grammar Induction for Language Modeling. Harvard Computer Science Group Technical Report TR-01-95.

Research Data

Abstract

We describe a corpus-based induction algorithm for probabilistic context-free grammars. The algorithm employs a greedy heuristic search within a Bayesian framework, and a post-pass using the Inside-Outside algorithm. We compare the performance of our algorithm to n-gram models and the Inside-Outside algorithm in three language modeling tasks. In two of these domains, our algorithm outperforms these other techniques, marking the first time a grammar-based language model has surpassed n-gram modeling in a task of at least moderate size.

Description

Other Available Sources

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories