Publication: Structured Neural Models for Coreference and Generation
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Natural Language Processing (NLP) has recently entered a period marked by impressive empirical performance on a wide variety of natural language tasks, with much of this empirical success due to the use of deep learning techniques. Deep learning has, in particular, offered a simple approach for learning expressive, global models of linguistic data. While incredibly powerful, this style of modeling poses a problem for NLP tasks that require structured prediction, the prediction of outputs with combinatorial structure, such as sequences, trees, and graphs. Indeed, whereas the standard approach to tackling structured prediction problems in NLP involves predicting the structure with the highest score under the learned model, it is often intractable to find the highest scoring structure under the sort of global model common in deep learning approaches to NLP. In this thesis we argue that search-based structured prediction, where a model is trained to search incrementally for a structure, is a particularly natural choice for doing structured prediction with deep models. Specifically, we argue that recurrent neural networks make it simple and convenient to compactly represent the history of incremental predictions made during search, which allows for the learning of powerful search-based structured predictors. We first investigate this approach to deep structured prediction in the context of the NLP task of coreference resolution. In particular we first discuss a baseline, neural coreference resolver, which was sufficient for state-of-the-art performance on its introduction, and we then show that a search-based, structured approach improves even over this. We then discuss an approach to training the celebrated sequence-to-sequence model as a search-based structured predictor, and we show that this leads to improvements on word ordering, dependency parsing, and machine translation tasks. In the final chapter of this thesis we discuss the structured prediction problem of long-form text generation, and database-to-text generation in particular, which is not well handled by the techniques introduced in the preceding chapters. We introduce a new dataset for studying the challenges posed by this structured prediction problem, suggest new automatic approaches to evaluating performance on this problem, and use these automatic approaches to analyze the performance of various state-of-the-art generation models.