Policy Teaching Through Reward Function Learning
View/ Open
Zhang_Policy.pdf (236.2Kb)
Access Status
Full text of the requested work is not available in DASH at this time ("restricted access"). For more information on restricted deposits, see our FAQ.Published Version
https://doi.org/10.1145/1566374.1566417Metadata
Show full item recordCitation
Zhang, Haoqi, David C. Parkes, and Yiling Chen. 2009. Policy teaching through reward function learning. In Proceedings of the tenth ACM Conference on Electronic Commerce : July 6-10, 2009, Stanford, California, ed. J. Chuang, 295-304. New York: ACM Press.Abstract
Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad-network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.Other Sources
http://www.eecs.harvard.edu/econcs/pubs/zhangec09.pdfCitable link to this page
http://nrs.harvard.edu/urn-3:HUL.InstRepos:3996846
Collections
- FAS Scholarly Articles [18179]
Contact administrator regarding this item (to report mistakes or request changes)