| Title: | Policy Teaching Through Reward Function Learning |
| Author: |
Zhang, Haoqi; Parkes, David C.; Chen, Yiling
Note: Order does not necessarily reflect citation order of authors. |
| Citation: | Zhang, Haoqi, David C. Parkes, and Yiling Chen. 2009. Policy teaching through reward function learning. In Proceedings of the tenth ACM Conference on Electronic Commerce : July 6-10, 2009, Stanford, California, ed. J. Chuang, 295-304. New York: ACM Press. |
| Access Status: | At the direction of the depositing author this work is not currently accessible through DASH. |
| Full Text & Related Files: |
Zhang_Policy.pdf (241.8Kb; PDF)
|
| Abstract: | Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad-network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents. |
| Published Version: | http://portal.acm.org/citation.cfm?id=1566417&dl=ACM |
| Other Sources: | http://www.eecs.harvard.edu/econcs/pubs/zhangec09.pdf |
| Citable link to this page: | http://nrs.harvard.edu/urn-3:HUL.InstRepos:3996846 |
Contact administrator regarding this item (to report mistakes or request changes)