Publication: Learning in Neural Networks: Lazy training, Feature Learning, and Fine-Tuning
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Neural networks trained on large amounts of data have found groundbreaking applications in language modeling, vision, and many other fields. The modern machine learning pipeline usually involves pre-training a model on a large, diverse dataset, and post-trained (e.g. fine tuned) on specialized downstream tasks. Models are able to learn good representations of the data in the pre-training stage, which is later tuned in the post-training stage. Despite the vast success of this pipeline, the exact mechanisms by which models are able to adapt their features to downstream tasks remains poorly understood.
In this thesis, we initially explore existing theoretical work on understanding questions related to over parametrization, generalization, and representation learning. To that end, we survey the literature on various mathematical techniques to answer these questions ranging from the neural tangent kernel and mean-field method to the drift martingale analysis; We do this while presenting original insights using self-contained examples and proofs.
Finally, we present original work on low-rank fine-tuning, which establishes a separation between the other learning regimes in the literature. In particular, we show that while fine-tuning is different than lazy training, it has a significantly lower sample and iteration complexity than full feature learning.