Publication:
Policy compression: Acting with limited cognitive resources

No Thumbnail Available

Date

2024-05-10

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Lai, Lucy. 2024. Policy compression: Acting with limited cognitive resources. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

In a complex and information-rich world, how do humans and other animals learn and make decisions with limited cognitive resources? This thesis aims to answer this question by presenting a novel theoretical framework for understanding how capacity-limited agents should allocate their cognitive resources to balance the pursuit of reward with the burden of cognitive cost. Specifically, I formalize the idea of policy compression, or the simplification of action policies to reduce their representational cost. Policy compression marries reinforcement learning with information theory by representing policies as capacity-limited channels that transmit information about state to guide action selection. The upper bound on capacity results in a resource-rational trade-off between reward and the complexity cost of representing one's action policies. In this body of work, I employ computational modeling, behavioral experiments, and evidence from lesion studies to explore the implications and applications of policy compression to illuminate how biological agents act with limited cognitive resources. First, I introduce the theory and show how policy compression can account for a wide range of behavioral phenomena, from stochasticity and perseveration to state and action chunking. Next, I provide experimental evidence that human decision making indeed reflects a balance between reward maximization and policy compression, resulting in systematic choice biases towards frequently used actions. I show that this behavioral pattern is uniquely explained by the policy compression model, which outperforms competing models that incorporate memory constraints in instrumental learning. I then introduce the notion of conditional policy compression, which expands the theory to include reductions in cognitive cost by conditioning on additional information. Conditional compression explains why action chunking reduces policy complexity without sacrificing reward. I confirm these predictions experimentally, showing that people employ chunking as an adaptive strategy to reduce cognitive load by exploiting temporal structure in the environment. Finally, I search for potential neural substrates that implement cost-sensitive action selection. Using evidence from lesion studies in motor sequence learning, I propose a computational division of labor between the dorsolateral striatum (DLS) and dorsomedial striatum (DMS), two areas of the basal ganglia that balance automatic, history-dependent action selection with costly, flexible behavior. By using a default policy learned by DLS to “regularize” the reward-maximizing policies learned by DMS, the brain achieves a balance between robustness and flexibility in complex behaviors such as motor learning. By taking a resource-rational approach to decision making---that is, by specifying how capacity-limited agents should allocate their computational resources---policy compression offers a new normative lens through which to interpret the peculiarities of behavior and the functional organization of brain areas.

Description

Other Available Sources

Keywords

Decision making, Reinforcement learning, Resource-rationality, Neurosciences, Psychology, Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories