Publication: Induction of Probabilistic Synchronous Tree-Insertion Grammars
Open/View Files
Date
2005
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Nesson, Rebecca, Stuart M. Shieber, and Alexander Rush. 2005. Induction of Probabilistic Synchronous Tree-Insertion Grammars. Harvard Computer Science Group Technical Report TR-20-05.
Research Data
Abstract
Increasingly, researchers developing statistical machine translation systems have moved to incorporate syntactic structure in the models that they induce. These researchers are motivated by the intuition that the limitations in the finite-state translation models exemplified by IBM’s “Model 5” follow from the inability to use phrasal and hierarchical information in the interlingual mapping. What is desired is a formalism that has the substitution-based hierarchical structure provided by context-free grammars, with the lexical relationship potential of n-gram models, with processing efficiency no worse than CFGs. Further, it should ideally allow for discontinuity in phrases, and be synchronizable, to allow for multilinguality. Finally, in order to support automated induction, it should allow for a probabilistic variant. We introduce probabilistic synchronous tree-insertion grammars (PSTIG) as such a formalism. In this paper, we define a restricted version of PSTIG, and provide algorithms for parsing, parameter estimation, and translation. As a proof of concept, we successfully apply these algorithms to a toy problem, corpus-based induction of a statistical translator of arithmetic expressions from postfix to partially parenthesized infix.
Description
Other Available Sources
Keywords
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service