Publication: Odors as ''natural language'': sparse neural networks in mammalian olfactory systems and large language models
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
The studies of physics, neuroscience, and artificial intelligence (AI) have a long intertwined history. Particularly, sparse connectivity is a common feature of the brain neural networks and a key focus in AI for efficient computation; notably, pruning trained networks for sparse connectivity has a long history, partially inspired by neuroscience. This thesis explores sparse neural networks through two linked research topics: one focused on the brain (bilateral alignment in olfactory systems), and the other on AI (pruning large language models for on-device AI assistants).
For the first topic, inspired by mammalian dual nostrils creating two cortical neural representations of odors, in Chapter 1, we studied how to construct the inter-hemispheric projections aligning these representations. We hypothesized that this construction originates from online learning since mammals are constantly breathing. With a local Hebbian rule, we found that sparse inter-hemispheric projections suffice for bilateral alignment and discovered an inverse scaling that more cortical neurons allow sparser projections. Also, the local Hebbian rule was found to approximate the global stochastic gradient descent (SGD) rule since their update vectors align, suggesting that biologically plausible learning rules can approximate global learning rules if they contain the gradient information of the latter.
The next chapter extends Chapter 1 from four perspectives: an analysis of the update vector alignment between Hebbian and SGD rules and how it depends on the network parameters; a simple theory that recurrent connections in olfactory cortex may improve the bilateral alignment, inspired by the Hopfield Networks (associative memory) and similar to the design of Google Titans model that combines recurrent neural networks with Transformers; the dynamical properties of Hebbian learning; and finally, the geometric landscape of Hebbian learning.
A similar inverse scaling has been discovered in the Transformer attention matrices used in large language models (LLMs), which motivated the second topic. Concretely, we pruned pretrained Meta Llama-2 and Llama-3 models to obtain models with fewer parameters and develop on-device AI assistants, explored their sparsity limits, and compared their performance at the limits. We found that more than 50% of the parameters in both models could be pruned, and Llama-3 produced fewer factual errors at the sparsity limit but required more parameters presumably due to its training settings and dataset.
In summary, by studying sparsity in both biological and artificial neural networks, this thesis may provide valuable insights into the general bilateral alignment problem in neuroscience (across different modalities and brain regions such as the frontal cortex responsible for short-term and motor response and the medial entorhinal cortex for spatial memory), open the door to interesting theoretical questions, and inspire more efficient AI algorithms or applications.