Publication: Policy Teaching Through Reward Function Learning
Date
2009
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Association for Computing Machinery
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Zhang, Haoqi, David C. Parkes, and Yiling Chen. 2009. Policy teaching through reward function learning. In Proceedings of the tenth ACM Conference on Electronic Commerce : July 6-10, 2009, Stanford, California, ed. J. Chuang, 295-304. New York: ACM Press.
Research Data
Abstract
Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad-network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.
Description
Other Available Sources
Keywords
active indirect elicitation, environment design, policy teaching, preference elicitation, preference learning
Terms of Use
Metadata Only