Publication: On Scaling Dynamics in Deep Learning
No Thumbnail Available
Open/View Files
Date
2023-09-11
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Kaplun, Gal. 2023. On Scaling Dynamics in Deep Learning. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
In recent years, the field of Deep Learning has experienced remarkable growth, fueled by advancements in computational power and data availability, arguably leading to the emergence of genuine Artificial Intelligence. One critical aspect underpinning this success is the predictable scaling attributes of models' performance, adhering to power laws concerning resources invested in training, such as model size, data volume, and training duration. This thesis delves into the concept of Scaling Dynamics, systematically investigating the relationship between a model's behavior and the resources invested in it. We explore key aspects of Scaling Dynamics, including Regimes of Learning, Monotonicity, Symmetries, and Trade-offs. Our main contributions are: (1) Observing that DNNs initially learn simple concepts and progressively increase in complexity as a function of resources invested; (2) Identifying the Deep Double Descent phenomenon, wherein models exhibit unexpected behavior in the critical regime between under- and over-parameterization; (3) Introducing the pointwise perspective, enabling a fine-grained understanding of distribution shifts; (4) Demonstrating that in the online learning regime, SGD noise does not enhance optimization beyond computational considerations, and that the training dynamics do not deviate far from the path of Gradient Flow.
These findings pave the way for future exploration, raising intriguing questions about the nature of Scaling Dynamics and how they shed light on Deep Learning as a whole.
Description
Other Available Sources
Keywords
Deep Learning, Computer science
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service