Publication: Parsing Inside-Out
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Probabilistic Context-Free Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probability that any given non-terminal covers any piece of the input sentence. The traditional use of these probabilities is to improve the probabilities of grammar rules. In this thesis we show that these values are useful for solving many other problems in Statistical Natural Language Processing. We give a framework for describing parsers. The framework generalizes the inside and outside values to semirings. It makes it easy to describe parsers that compute a wide variety of interesting quantities, including the inside and outside probabilities, as well as related quantities such as Viterbi probabilities and n-best lists. We also present three novel uses for the inside and outside probabilities. The first novel use is an algorithm that gets improved performance by optimizing metrics other than the exact match rate. The next novel use is a similar algorithm that, in combination with other techniques, speeds Data-Oriented Parsing, by a factor of 500. The third use is to speed parsing for PCFGs using thresholding techniques that approximate the inside-outside product; the thresholding techniques lead to a 30 times speedup at the same accuracy level as conventional methods. At the time this research was done, no state of the art grammar formalism could be used to compute inside and outside probabilities. We present the Probabilistic Feature Grammar formalism, which achieves state of the art accuracy, and can compute these probabilities.