Publication:
On Scaling Dynamics in Deep Learning

No Thumbnail Available

Date

2023-09-11

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Kaplun, Gal. 2023. On Scaling Dynamics in Deep Learning. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

In recent years, the field of Deep Learning has experienced remarkable growth, fueled by advancements in computational power and data availability, arguably leading to the emergence of genuine Artificial Intelligence. One critical aspect underpinning this success is the predictable scaling attributes of models' performance, adhering to power laws concerning resources invested in training, such as model size, data volume, and training duration. This thesis delves into the concept of Scaling Dynamics, systematically investigating the relationship between a model's behavior and the resources invested in it. We explore key aspects of Scaling Dynamics, including Regimes of Learning, Monotonicity, Symmetries, and Trade-offs. Our main contributions are: (1) Observing that DNNs initially learn simple concepts and progressively increase in complexity as a function of resources invested; (2) Identifying the Deep Double Descent phenomenon, wherein models exhibit unexpected behavior in the critical regime between under- and over-parameterization; (3) Introducing the pointwise perspective, enabling a fine-grained understanding of distribution shifts; (4) Demonstrating that in the online learning regime, SGD noise does not enhance optimization beyond computational considerations, and that the training dynamics do not deviate far from the path of Gradient Flow. These findings pave the way for future exploration, raising intriguing questions about the nature of Scaling Dynamics and how they shed light on Deep Learning as a whole.

Description

Other Available Sources

Keywords

Deep Learning, Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories