Publication: In the Blink of an Eye: A Unified Theory for Feature Emergence in Generative Models
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Research Data
Abstract
Generative models, which produce samples of data such as text or images, are transforming the way we interact with technology. However, they often fail quickly in problematic and unintuitive ways. For example, a language model given a software engineering problem suddenly switched from coding to searching for pictures of Yellowstone National Park, and these rapid shifts in behavior have been observed in reasoning traces and hacks. This phenomenon is not unique to language models: in image generation models, key features of the final output, like objects in the background or the color, are also decided in narrow “critical windows” of the generation process.
While critical windows for a particular type of image generation model called diffusion have been studied at length by statistical physicists, existing theory relies on the specifics of diffusion and strong assumptions on the distribution of model generations. In this thesis, we develop a unifying framework for critical windows that shows that they emerge generically when the sampler specializes to a sub-population of the distribution it models. Drawing on tools from information theory, machine learning, high-dimensional probability theory, and statistical physics, our theory improves upon previous work by using rigorous mathematical tools and is agnostic to the underlying model type or distribution, applying to both language models and diffusion. The key insight of our approach is to exploit the powerful formalism for generative models of stochastic localization, which has roots as a proof technique in probability theory.
Leveraging our consolidated theory for critical windows, we apply it to different examples of critical windows in theoretical and empirical contexts. We provide a novel interpretation of the all-or-nothing phase transition in statistical inference as a critical window and use our framework to explain different failure modes of language models. We finally validate our predictions empirically for real-world models, and demonstrate that critical windows have applications towards improving the safety, privacy, and fairness of generative models.