Publication: Combinatorial Tasks as Model Systems of Deep Learning
No Thumbnail Available
Open/View Files
Date
2024-05-16
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Edelman, Benjamin. 2024. Combinatorial Tasks as Model Systems of Deep Learning. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
This dissertation is about a particular style of research. The philosophy of this style is that in order to scientifically understand deep learning, it is fruitful to investigate what happens when neural networks are trained on simple, mathematically well-defined tasks. Even though the training data is simple, the training algorithm can end up producing rich, unexpected results; and understanding these results can shed light on fundamental mysteries of high relevance to contemporary deep learning.
First, we situate this methodological approach in a broader scientific context, discussing and systematizing the role of model systems in science and in the science of deep learning in particular. We then present five intensive case studies, each of which uses a particular combinatorial task as a lens through which to demystify puzzles of deep learning.
The combinatorial tasks employed are sparse Boolean functions, sparse parities, learning finite group operations, performing modular addition, and learning Markov chains in-context. Topics of explanatory interest include the inductive biases of the transformer architecture, the phenomenon of emergent capabilities during training, the nuances of deep learning in the presence of statistical-computational gaps, the tradeoffs between different resources of training, the effect of network width on optimization, the relationship between symmetries in training data and harmonic structure in trained networks, the origins of the mechanisms of in-context learning in transformers, and the influence of spurious solutions on optimization.
Description
Other Available Sources
Keywords
Computer science
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service