Publication: Hidden State Inference in the Midbrain Dopamine System
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Midbrain dopamine neurons signal reward prediction error (RPE), or actual minus expected reward. The temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially track elapsed time relative to observable stimuli. In the real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to the proposal that TD learning might instead compute a value signal based on an inferred distribution of hidden states (a ‘belief state’). In Chapter 1, I asked whether dopaminergic signaling supports a TD learning framework that operates over hidden states. I found that dopamine signaling exhibited a striking difference between two tasks that differed only with respect to whether reward was delivered deterministically. My results favor an associative learning rule that combines cached values with hidden state inference. In Chapter 2, I used the behavioral paradigm developed in Chapter 1 to examine a possible cortical contribution to computing dopamine RPEs. I found that inactivation of the medial prefrontal cortex (mPFC) affected dopaminergic signaling in a task in which the state of the environment was hidden and must be inferred, but not in a task in which the state was known with certainty. Computational modeling suggests that the effects of inactivation are best explained by a circuit in which the mPFC conveys inference over hidden states to the dopamine system. My findings support a reinforcement learning circuitry in which the mPFC furnishes inferences about hidden states into the subcortical reward prediction machinery, with dopamine neurons signaling errors in these reward predictions. These results provide key insights into how the brain implements reinforcement learning, particularly in ambiguous settings.