Two-Stream Transformer Architecture With Discrete Attention for Better Interpretrability and Separation of Model Concerns
Citation
Kinley, Jambay. 2020. Two-Stream Transformer Architecture With Discrete Attention for Better Interpretrability and Separation of Model Concerns. Bachelor's thesis, Harvard College.Abstract
The transformer has become a central model for natural language processing tasks ranging from translation to classification to representation learning. Its success demonstrates the effectiveness of stacked attention as a replacement for recurrence for many tasks. Attention is broadly interpreted as selectively attending to different parts on an input. So, in theory, attention offers more insights into the model’s internal decisions; however, in practice, when stacked, it quickly becomes nearly as fully-connected, making it hard to disentangle final decision dependencies.In this work, we propose an alternative transformer architecture, discrete transformer, with the goal of improving model interpretability. We use discrete latent variable attention to ensure that decision steps only depend on a limited context. We separate out attention decisions from representation modeling by using a separate stream for each.
Empirically, on both classification and translation tasks, this approach maintains similar levels of performance on several datasets as the standard transformer, while obtaining quantitatively better attention interpretability and separating out syntactic features in the learned representations.
Finally, our two-stream formulation can be used to transfer knowledge in a multiview arithmetic evaluation task.
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364732
Collections
- FAS Theses and Dissertations [6848]
Contact administrator regarding this item (to report mistakes or request changes)