Publication: Statistical Mechanics of Generalization in Kernel Regression and Wide Neural Networks
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
A theoretical understanding of generalization remains an open problem for many machine learning models, including deep neural networks. Here, we study this problem for kernel regression, which, besides being a popular machine learning method, also describes wide neural networks. We develop an analytical theory of generalization in kernel regression using replica theory of statistical mechanics, which is applicable to any kernel and data distribution. Experiments with practical kernels, including those arising from wide neural networks, show perfect agreement with our theory. We provide an in-depth analysis of our theory for kernel generalization. We show that kernel machines employ an inductive bias towards simple functions, preventing them to overfit the data. We characterize whether a kernel is compatible with a learning task in terms of sample efficiency. We identify a first order phase transition in our theory where more data may impair generalization when the task is noisy or not expressible by the kernel. We extend these results to out-of-distribution generalization and quantum kernel machines. We study representation learning in Bayesian Neural Networks using perturbation theory, and show that the features of wide neural networks receive corrections from the target labels.