Publication: On Scaling Dynamics in Deep Learning
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Research Data
Abstract
In recent years, the field of Deep Learning has experienced remarkable growth, fueled by advancements in computational power and data availability, arguably leading to the emergence of genuine Artificial Intelligence. One critical aspect underpinning this success is the predictable scaling attributes of models' performance, adhering to power laws concerning resources invested in training, such as model size, data volume, and training duration. This thesis delves into the concept of Scaling Dynamics, systematically investigating the relationship between a model's behavior and the resources invested in it. We explore key aspects of Scaling Dynamics, including Regimes of Learning, Monotonicity, Symmetries, and Trade-offs. Our main contributions are: (1) Observing that DNNs initially learn simple concepts and progressively increase in complexity as a function of resources invested; (2) Identifying the Deep Double Descent phenomenon, wherein models exhibit unexpected behavior in the critical regime between under- and over-parameterization; (3) Introducing the pointwise perspective, enabling a fine-grained understanding of distribution shifts; (4) Demonstrating that in the online learning regime, SGD noise does not enhance optimization beyond computational considerations, and that the training dynamics do not deviate far from the path of Gradient Flow. These findings pave the way for future exploration, raising intriguing questions about the nature of Scaling Dynamics and how they shed light on Deep Learning as a whole.