Publication: Structure Modeling for Language Models
No Thumbnail Available
Open/View Files
Date
2023-05-12
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Deng, Yuntian. 2023. Structure Modeling for Language Models. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
Neural language models are probabilistic models of text parameterized by neural networks. They are widely applicable to applications with outputs consisting of discrete sequences, such as document summarization, question answering, and image captioning. The minimal assumptions about data enable advancements in language modeling to drive improvements across a diverse array of applications.
In natural language, structures are both pervasive and essential. For example, a book is organized into chapters, with a logical flow connecting them; without this structure, the book would lose its coherence. Therefore, effectively understanding and modeling textual sequences requires comprehending and representing the inherent structures. This thesis focuses on structure modeling for language models.
The thesis is organized into two primary sections: structure analysis for language models and structure modeling techniques. The first section investigates the modeling of various structural aspects in language model generations, including section transition structures, coreference structures, and topic correlation structures. Emphasizing the need for a comprehensive understanding of these components, the thesis assesses language model performance at the structural level. Adapting a statistical framework to evaluate high-level coherence in machine-generated texts, the research uncovers that even large language models display limitations in capturing discourse coherence and coreference. Furthermore, it demonstrates that improvements in surface-level modeling do not necessarily guarantee better structure modeling.
The second section of the thesis presents a variety of structure modeling techniques aimed at improving or customizing language models. These techniques are organized into three categories: factorized structure modeling, hierarchical structure modeling, and global structure modeling. They can improve the structural coherence, transparency, computational efficiency, and data efficiency of language models.
In conclusion, this research conducts an in-depth exploration of structure analysis and modeling techniques for language models. Through the development of various structure analysis methods and modeling approaches, the thesis endeavors to deepen the understanding and improvement of language models concerning their capacity to represent structures. The proposed techniques hold the potential to boost language model performance across a broad array of applications, ultimately advancing not only natural language generation but also other domains where the output space comprises discrete sequences, such as computer vision, robotics, and genomics.
Description
Other Available Sources
Keywords
language model, Natural Language Processing, structure modeling, text generation, Computer science, Artificial intelligence
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service