Publication: An opponent striatal circuit for distributional reinforcement learning
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Machine learning research has achieved large performance gains on a wide range of tasks by expanding the learning target from mean rewards to entire probability distributions of rewards — an approach known as distributional reinforcement learning (DRL). The mesolimbic dopamine system is thought to underlie reinforcement learning in the mammalian brain by updating a representation of mean value in the striatum, but little is known about whether, where, and how neurons in this circuit encode information about higher-order moments of reward distributions.
In this dissertation, we first review the mathematical foundations and biological plausibility of these DRL algorithms. We then provide the first experimental and computational evidence that the striatum, specifically, learns such reward distributions. We used high-density probes (Neuropixels) to acutely record striatal activity from well-trained, water-restricted mice performing a classical conditioning task in which reward mean, reward variance, and stimulus identity were independently manipulated. In contrast to traditional RL accounts, distributions with the same mean and variance were encoded more similarly than those with the same mean but different variance. Remarkably, chronic ablation of dopamine inputs disorganized these distributional representations in the striatum without interfering with mean value coding. Two-photon calcium imaging and optogenetics revealed that the two major classes of striatal medium spiny neurons — D1 and D2 MSNs — contributed to this code by preferentially encoding the right and left tails of the reward distribution, respectively. We synthesize these findings into a new model of the striatum and mesolimbic dopamine that harnesses the opponency between D1 and D2 MSNs to reap the computational benefits of DRL.