Induction of Probabilistic Synchronous Tree-Insertion Grammars
MetadataShow full item record
CitationNesson, Rebecca, Stuart M. Shieber, and Alexander Rush. 2005. Induction of Probabilistic Synchronous Tree-Insertion Grammars. Harvard Computer Science Group Technical Report TR-20-05.
AbstractIncreasingly, researchers developing statistical machine translation systems have moved to incorporate syntactic structure in the models that they induce. These researchers are motivated by the intuition that the limitations in the finite-state translation models exemplified by IBM’s “Model 5” follow from the inability to use phrasal and hierarchical information in the interlingual mapping. What is desired is a formalism that has the substitution-based hierarchical structure provided by context-free grammars, with the lexical relationship potential of n-gram models, with processing efficiency no worse than CFGs. Further, it should ideally allow for discontinuity in phrases, and be synchronizable, to allow for multilinguality. Finally, in order to support automated induction, it should allow for a probabilistic variant. We introduce probabilistic synchronous tree-insertion grammars (PSTIG) as such a formalism. In this paper, we define a restricted version of PSTIG, and provide algorithms for parsing, parameter estimation, and translation. As a proof of concept, we successfully apply these algorithms to a toy problem, corpus-based induction of a statistical translator of arithmetic expressions from postfix to partially parenthesized infix.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:25811006
- FAS Scholarly Articles