Policy Teaching Through Reward Function Learning

DSpace/Manakin Repository

Policy Teaching Through Reward Function Learning

Citable link to this page


Title: Policy Teaching Through Reward Function Learning
Author: Zhang, Haoqi; Parkes, David C.; Chen, Yiling

Note: Order does not necessarily reflect citation order of authors.

Citation: Zhang, Haoqi, David C. Parkes, and Yiling Chen. 2009. Policy teaching through reward function learning. In Proceedings of the tenth ACM Conference on Electronic Commerce : July 6-10, 2009, Stanford, California, ed. J. Chuang, 295-304. New York: ACM Press.
Access Status: Full text of the requested work is not available in DASH at this time (“dark deposit”). For more information on dark deposits, see our FAQ.
Full Text & Related Files:
Abstract: Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad-network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.
Published Version: http://portal.acm.org/citation.cfm?id=1566417&dl=ACM
Other Sources: http://www.eecs.harvard.edu/econcs/pubs/zhangec09.pdf
Citable link to this page: http://nrs.harvard.edu/urn-3:HUL.InstRepos:3996846
Downloads of this work:

Show full Dublin Core record

This item appears in the following Collection(s)


Search DASH

Advanced Search