Publication: Dopamine reward prediction errors reflect hidden state inference across time
Open/View Files
Date
2017
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Starkweather, Clara Kwon, Benedicte M. Babayan, Naoshige Uchida, and Samuel J. Gershman. 2017. “Dopamine reward prediction errors reflect hidden state inference across time.” Nature neuroscience 20 (4): 581-589. doi:10.1038/nn.4520. http://dx.doi.org/10.1038/nn.4520.
Research Data
Abstract
Midbrain dopamine neurons signal reward prediction error (RPE), or actual minus expected reward. The temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially track elapsed time relative to observable stimuli. In the real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to the proposal that TD learning might instead compute a value signal based on an inferred distribution of hidden states (a ‘belief state’). In this work, we asked whether dopaminergic signaling supports a TD learning framework that operates over hidden states. We found that dopamine signaling exhibited a striking difference between two tasks that differed only with respect to whether reward was delivered deterministically. Our results favor an associative learning rule that combines cached values with hidden state inference.
Description
Other Available Sources
Keywords
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service