Publication:
Statistical mechanics of Bayesian inference and learning in neural networks

No Thumbnail Available

Date

2024-04-11

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Zavatone-Veth, Jacob Andreas. 2024. Statistical mechanics of Bayesian inference and learning in neural networks. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

This thesis collects a few of my essays towards understanding representation learning and generalization in neural networks. I focus on the model setting of Bayesian learning and inference, where the problem of deep learning is naturally viewed through the lens of statistical mechanics. First, I consider properties of freshly-initialized deep networks, with all parameters drawn according to Gaussian priors. I provide exact solutions for the marginal prior predictive of networks with isotropic priors and linear or rectified-linear activation functions. I then study the effect of introducing structure to the priors of linear networks from the perspective of random matrix theory. Turning to memorization, I consider how the choice of nonlinear activation function affects the storage capacity of treelike neural networks. Then, we come at last to representation learning. I study the structure of learned representations in Bayesian neural networks at large but finite width, which are amenable to perturbative treatment. I then show how the ability of these networks to generalize when presented with unseen data is affected by representational flexibility, through precise comparison to models with frozen, random representations. In the final portion of this thesis, I bring a geometric perspective to bear on the structure of neural network representations. I first consider how the demand of fast inference shapes optimal representations in recurrent networks. Then, I consider the geometry of representations in deep object classification networks from a Riemannian perspective. In total, this thesis begins to elucidate the structure and function of optimally distributed neural codes in artificial neural networks.

Description

Other Available Sources

Keywords

Deep learning, Random matrices, Theoretical neuroscience, Theoretical physics, Neurosciences, Statistical physics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories