Publication:

Learning to See Agents with Deep Variational Inference

Loading...
Thumbnail Image

Date

2025-05-16

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Muppidi, Aneesh Chenna Reddy. 2025. Learning to See Agents with Deep Variational Inference. Bachelors Thesis, Harvard University Engineering and Applied Sciences.

Abstract

Unsupervised agent discovery is the ability to identify and model intentional agents from raw perceptual data without explicit supervision. While neurocognitive theories propose different neural mechanisms for agent perception—including mirror neurons and the superior temporal sulcus (STS), we lack computational algorithms that can fully describe agent perception. Existing computational models of agent perception operate on simplified symbolic inputs rather than the raw perceptual data that biological systems process. We introduce a variational objective LVAD that formulates vision-based agent discovery as structured inference over latent actions. Based on LVAD, we implement a deep conditional slot-based variational autoencoder called VAD (Variational Agent Discovery) model. Our model learns internal agent representations directly from raw pixel-based observations, outperforming baselines on predictive tasks including agent action and goal inference in three video-game settings. VAD's internal representations generalize robustly to novel agents and environmental configurations, demonstrating up to 33% advantage in transfer scenarios. The VAD model exhibits predictive capabilities analogous to those observed in infant cognition studies, correctly predicting that agents will take efficient paths to goals when environmental constraints change. Analysis of learned representations reveals functional decomposition of visual scenes along agent-centric lines, with certain neural features exhibiting human mirror-neuron-like activation patterns across different agents performing the same actions. When incorporated as an auxiliary loss in multi-agent reinforcement learning, our VAD objective improves sample efficiency by 21.8% and final performance by 7.6%.

Description

Other Available Sources

Research Data

Keywords

Agent Perception, Reinforcement Learning, Variational Inference, Artificial intelligence, Cognitive psychology, Neurosciences

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories