Person:

Gershman, Samuel

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Gershman

First Name

Samuel

Name

Gershman, Samuel

Search Results

Now showing 1 - 10 of 11
  • Publication

    A Unifying Probabilistic View of Associative Learning

    (Public Library of Science, 2015) Gershman, Samuel

    Two important ideas about associative learning have emerged in recent decades: (1) Animals are Bayesian learners, tracking their uncertainty about associations; and (2) animals acquire long-term reward predictions through reinforcement learning. Both of these ideas are normative, in the sense that they are derived from rational design principles. They are also descriptive, capturing a wide range of empirical phenomena that troubled earlier theories. This article describes a unifying framework encompassing Bayesian and reinforcement learning theories of associative learning. Each perspective captures a different aspect of associative learning, and their synthesis offers insight into phenomena that neither perspective can explain on its own.

  • Publication

    Plans, Habits, and Theory of Mind

    (Public Library of Science, 2016) Gershman, Samuel; Gerstenberg, Tobias; Baker, Chris L.; Cushman, Fiery

    Human success and even survival depends on our ability to predict what others will do by guessing what they are thinking. If I accelerate, will he yield? If I propose, will she accept? If I confess, will they forgive? Psychologists call this capacity “theory of mind.” According to current theories, we solve this problem by assuming that others are rational actors. That is, we assume that others design and execute efficient plans to achieve their goals, given their knowledge. But if this view is correct, then our theory of mind is startlingly incomplete. Human action is not always a product of rational planning, and we would be mistaken to always interpret others’ behaviors as such. A wealth of evidence indicates that we often act habitually—a form of behavioral control that depends not on rational planning, but rather on a history of reinforcement. We aim to test whether the human theory of mind includes a theory of habitual action and to assess when and how it is deployed. In a series of studies, we show that human theory of mind is sensitive to factors influencing the balance between habitual and planned behavior.

  • Publication

    When Does Model-Based Control Pay Off?

    (Public Library of Science, 2016) Kool, Wouter; Cushman, Fiery; Gershman, Samuel

    Many accounts of decision making and reinforcement learning posit the existence of two distinct systems that control choice: a fast, automatic system and a slow, deliberative system. Recent research formalizes this distinction by mapping these systems to “model-free” and “model-based” strategies in reinforcement learning. Model-free strategies are computationally cheap, but sometimes inaccurate, because action values can be accessed by inspecting a look-up table constructed through trial-and-error. In contrast, model-based strategies compute action values through planning in a causal model of the environment, which is more accurate but also more cognitively demanding. It is assumed that this trade-off between accuracy and computational demand plays an important role in the arbitration between the two strategies, but we show that the hallmark task for dissociating model-free and model-based strategies, as well as several related variants, do not embody such a trade-off. We describe five factors that reduce the effectiveness of the model-based strategy on these tasks by reducing its accuracy in estimating reward outcomes and decreasing the importance of its choices. Based on these observations, we describe a version of the task that formally and empirically obtains an accuracy-demand trade-off between model-free and model-based strategies. Moreover, we show that human participants spontaneously increase their reliance on model-based control on this task, compared to the original paradigm. Our novel task and our computational analyses may prove important in subsequent empirical investigations of how humans balance accuracy and demand.

  • Publication

    The computational nature of memory modification

    (eLife Sciences Publications, Ltd, 2017) Gershman, Samuel; Monfils, Marie-H; Norman, Kenneth A; Niv, Yael

    Retrieving a memory can modify its influence on subsequent behavior. We develop a computational theory of memory modification, according to which modification of a memory trace occurs through classical associative learning, but which memory trace is eligible for modification depends on a structure learning mechanism that discovers the units of association by segmenting the stream of experience into statistically distinct clusters (latent causes). New memories are formed when the structure learning mechanism infers that a new latent cause underlies current sensory observations. By the same token, old memories are modified when old and new sensory observations are inferred to have been generated by the same latent cause. We derive this framework from probabilistic principles, and present a computational implementation. Simulations demonstrate that our model can reproduce the major experimental findings from studies of memory modification in the Pavlovian conditioning literature. DOI: http://dx.doi.org/10.7554/eLife.23763.001

  • Publication

    Predictive representations can link model-based reinforcement learning to model-free mechanisms

    (Public Library of Science, 2017) Russek, Evan M.; Momennejad, Ida; Botvinick, Matthew M.; Gershman, Samuel; Daw, Nathaniel D.

    Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation.

  • Publication

    Dopamine reward prediction errors reflect hidden state inference across time

    (2017) Starkweather, Clara; Babayan, Benedicte; Uchida, Naoshige; Gershman, Samuel

    Midbrain dopamine neurons signal reward prediction error (RPE), or actual minus expected reward. The temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially track elapsed time relative to observable stimuli. In the real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to the proposal that TD learning might instead compute a value signal based on an inferred distribution of hidden states (a ‘belief state’). In this work, we asked whether dopaminergic signaling supports a TD learning framework that operates over hidden states. We found that dopamine signaling exhibited a striking difference between two tasks that differed only with respect to whether reward was delivered deterministically. Our results favor an associative learning rule that combines cached values with hidden state inference.

  • Publication

    Toward a universal decoder of linguistic meaning from brain activation

    (Nature Publishing Group UK, 2018) Pereira, Francisco; Lou, Bin; Pritchett, Brianna; Ritter, Samuel; Gershman, Samuel; Kanwisher, Nancy; Botvinick, Matthew; Fedorenko, Evelina

    Prior work decoding linguistic meaning from imaging data has been largely limited to concrete nouns, using similar stimuli for training and testing, from a relatively small number of semantic categories. Here we present a new approach for building a brain decoding system in which words and sentences are represented as vectors in a semantic space constructed from massive text corpora. By efficiently sampling this space to select training stimuli shown to subjects, we maximize the ability to generalize to new meanings from limited imaging data. To validate this approach, we train the system on imaging data of individual concepts, and show it can decode semantic vector representations from imaging data of sentences about a wide variety of both concrete and abstract topics from two separate datasets. These decoded representations are sufficiently detailed to distinguish even semantically similar sentences, and to capture the similarity structure of meaning relationships between sentences.

  • Publication

    Empowerment contributes to exploration behaviour in a creative video game

    (Springer-Nature, 2022-01-14) Brändle, Franziska; Stocks, Lena; Tenenbaum, Joshua; Gershman, Samuel; Schulz, Eric

    Studies of human exploration frequently cast people as serendipitously stumbling upon good options. Yet these studies may not capture the richness of exploration strategies that people exhibit in more complex environments. We study human behavior in a large data set of 29,493 players of the richly-structured online game "Little Alchemy 2''. In this game, players start with four elements, which they can combine to create up to 720 complex objects. We find that players are driven to create objects that empower them to create even more objects. We find that this drive for empowerment is eliminated when people play a version of the game that lacks recognizable semantics, indicating that they use their knowledge about the world to guide their exploration. Our results suggest that the drive for empowerment may be a potent source of intrinsic motivation in richly structured domains, particularly those that lack explicit reward signals.

  • Publication

    Multi-Task Reinforcement Learning in Humans

    (Nature publishing group, 2021-01-28) Tomov, Momchil; Schulz, Eric; Gershman, Samuel

    The ability to transfer knowledge across tasks and generalize to novel ones is an important hallmark of human intelligence. Yet not much is known about human multi-task reinforcement learning. We study participants’ behavior in a novel two-step decision making task with multiple features and changing reward functions. We compare their behavior to two state-of-the-art algorithms for multi-task reinforcement learning, one that maps previous policies and encountered features to new reward functions and one that approximates value functions across tasks, as well as to standard model-based and model-free algorithms. Across three exploratory experiments and a large preregistered experiment, our results provide strong evidence for a strategy that maps previously learned policies to novel scenarios. These results enrich our understanding of human reinforcement learning in complex environments with changing task demands.

  • Publication

    Competition and Cooperation Between Multiple Reinforcement Learning Systems

    (Elsevier, 2018) Kool, Wouter; Cushman, Fiery; Gershman, Samuel

    Most psychological research on reinforcement learning has depicted two systems locked in battle for control of behavior: a flexible but computationally expensive “model-based” system and an inflexible but cheap “model-free” system. However, the complete picture is more complex, with the two systems cooperating in myriad ways. We focus on two issues at the frontier of this research program. First, how is the conflict between these systems adjudicated? Second, how can the systems be combined to harness the relative strengths of each? This chapter reviews recent work on competition and cooperation between the two systems, highlighting the computational principles that govern different forms of interaction.