Policy Teaching Through Reward Function Learning
Access StatusFull text of the requested work is not available in DASH at this time ("restricted access"). For more information on restricted deposits, see our FAQ.
MetadataShow full item record
CitationZhang, Haoqi, David C. Parkes, and Yiling Chen. 2009. Policy teaching through reward function learning. In Proceedings of the tenth ACM Conference on Electronic Commerce : July 6-10, 2009, Stanford, California, ed. J. Chuang, 295-304. New York: ACM Press.
AbstractPolicy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad-network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:3996846
- FAS Scholarly Articles