Publication: Causal Mediation Analysis Reveals Syntactic Agreement Mechanisms in Neural Language Models
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Research Data
Abstract
Targeted syntactic evaluations have demonstrated the ability of language models to perform subject-verb agreement given difficult contexts. Although this is well established, the mechanisms by which neural language models achieve syntactic agreement are still not well understood. As a remedy, this thesis applies causal mediation analysis to pre-trained neural language models to locate model components and discover mechanisms responsible for predicting correctly inflected verbs. In particular, we investigate the magnitude of models’ grammatical inflections preferences, as well as compare which neurons process subject-verb agreement across sentences with different syntactic structures. In our results, we uncover both similarities and differences across architectures and model sizes, and get a glimpse at the within-model mechanisms that produce number agreement. Notably, we learn that larger models do not necessarily learn stronger preferences, we observe two distinct mechanisms for producing subject- verb agreement depending on the syntactic structure of the input sentence, and we find that language models rely on similar sets of neurons when given sentences with similar syntactic structure.