Publication: How and What Does GPT Capture about Non-Compositionality?
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
The impressive performance and prevalence of transformers such as GPT in language modeling necessitates examination of the inner workings of the "black box" deep learning models. Language models have historically faced difficulties encoding non-compositional phrases---expressions with a meaning that is different from the combination of meanings of the individual words---such as "hot dog." Thus, we analyze whether and how transformer GPT-2 captures non-compositionality. We examine the internal representations of the model when processing non-compositional phrases through contextualized word embeddings and attention weights, the key components of the transformer architecture. We discover differences in both word representations and attention patterns between compositional and non-compositional phrases, especially in the upper layers. To determine the extent to which GPT-2 encodes non-compositionality, we implement two probing tasks. The first task shows that GPT-2 can encode the correct paraphrase of an idiom with high accuracy. The second probing task additionally reveals that GPT-2 can distinguish between literal and idiomatic expressions with high accuracy. Attention head ablation experiments on the second probing task show that a high performance is maintained even when over 40% of heads are directly ablated. Our work provides a better understanding of how phrases are processed, which is critical to developing more interpretable architectures and linguistic representations.