Publication:

Building the Theoretical Foundations of Deep Learning: An Empirical Approach

Loading...
Thumbnail Image

Date

2022-05-18

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Bansal, Yamini. 2022. Building the Theoretical Foundations of Deep Learning: An Empirical Approach. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

While tremendous practical progress has been made in deep learning, we lack a clear theoretical understanding of what makes deep learning work well, and why. In this thesis, we take a ``natural sciences'' approach towards building a theory for deep learning. We begin by identifying various empirical properties that emerge in practical deep networks across a variety of different settings. Then, we discuss how these empirical findings can be used to inform theory. Specifically, we show the following: (1) In contrast with supervised learning, state-of-the-art deep networks trained with self-supervised learning achieve bounded generalization gap under certain conditions, despite being over-parameterized. (2) Models with similar performance and architecture often converge to similar internal representations, even when their training method differs substantially (eg: supervised learning vs. self-supervised learning) (3) Interpolating classifiers obey a form of distributional generalization --- they converge to a type of conditional sampler from the training distribution. (4) The data scaling properties of deep networks are robust to changes in the architecture and noise levels of the training dataset. Our findings highlight that despite the lack of worst-case guarantees, deep networks implicitly behave in a predictable, structured manner, thus laying the foundations for future theoretical analysis.

Description

Other Available Sources

Research Data

Keywords

Deep Learning, Engineering

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories