Publication:

How to Get Transformers to Process in Steps

Loading...
Thumbnail Image

Date

2022-05-23

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Šakenis, Simas. 2022. How to Get Transformers to Process in Steps. Bachelor's thesis, Harvard College.

Abstract

As pointed out by Daniel Kahneman, there are at least two qualitatively distinct modes of cognition that take place in the human brain — fast and automatic thinking, labeled “System 1”, and slow and methodical thinking, labeled “System 2”. Progress in Artificial Intelligence requires approaches for modeling both kinds of cognition with computers. Classical programming is very effective for solving many tasks from the do- main of System 2, but is not practical for solving tasks from the domain of System 1. Machine learning is very effective for solving many tasks from the domain of System 1, but has not yet been shown to work robustly on tasks from the domain of System 2. In this work, we investigate if and how machine learning could be successfully applied for algorithmic tasks which are solved via System 2 by humans.

We argue that the standard way of framing machine learning problems that is suit- able for System 1 tasks is inadequate for System 2 tasks, propose an alternative, and demonstrate its effectiveness. Specifically, while learning a direct mapping from inputs to outputs is feasible for System 1 tasks, we argue that algorithmic System 2 tasks can only be solved by learning a mapping from inputs to outputs through a series of inter- mediate steps. We first show that by using enough intermediate steps a 1-layer Trans- former can in principle compute any finite function. We then show empirically that a 1-layer Transformer cannot learn to compute the sum of binary numbers directly from the inputs, but is able to compute the sum when trained to first generate a series of in- termediate results. This demonstrates, at a small scale, how a fixed-size neural network can lack the expressivity to encode the direct input-output mapping for an algorithmic task and yet be fully capable of computing the outputs through intermediate steps. Fi- nally, we show that a Frozen Pretrained Transformer is able to learn binary addition when trained to compute the carry bits before the sum, while it fails to learn the task without using the intermediates. These results support our hypothesis that the use of intermediate computations is necessary for tackling algorithmic tasks from the domain of System 2 via machine learning.

Description

Other Available Sources

Research Data

Keywords

System 2, Transformers, Artificial intelligence, Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories