Publication:

An opponent striatal circuit for distributional reinforcement learning

Loading...
Thumbnail Image

Date

2025-01-08

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Lowet, Adam Stanley. 2025. An Opponent Striatal Circuit for Distributional Reinforcement Learning. Doctoral Dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Machine learning research has achieved large performance gains on a wide range of tasks by expanding the learning target from mean rewards to entire probability distributions of rewards — an approach known as distributional reinforcement learning (DRL). The mesolimbic dopamine system is thought to underlie reinforcement learning in the mammalian brain by updating a representation of mean value in the striatum, but little is known about whether, where, and how neurons in this circuit encode information about higher-order moments of reward distributions.

In this dissertation, we first review the mathematical foundations and biological plausibility of these DRL algorithms. We then provide the first experimental and computational evidence that the striatum, specifically, learns such reward distributions. We used high-density probes (Neuropixels) to acutely record striatal activity from well-trained, water-restricted mice performing a classical conditioning task in which reward mean, reward variance, and stimulus identity were independently manipulated. In contrast to traditional RL accounts, distributions with the same mean and variance were encoded more similarly than those with the same mean but different variance. Remarkably, chronic ablation of dopamine inputs disorganized these distributional representations in the striatum without interfering with mean value coding. Two-photon calcium imaging and optogenetics revealed that the two major classes of striatal medium spiny neurons — D1 and D2 MSNs — contributed to this code by preferentially encoding the right and left tails of the reward distribution, respectively. We synthesize these findings into a new model of the striatum and mesolimbic dopamine that harnesses the opponency between D1 and D2 MSNs to reap the computational benefits of DRL.

Description

Other Available Sources

Research Data

Keywords

dopamine, population coding, reinforcement learning, reward, striatum, Neurosciences

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories