Publication: From Adversarial Imitation Learning to Robust Batch Imitation Learning
No Thumbnail Available
Open/View Files
Date
2020-06-17
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Ma, Yecheng Jason. 2020. From Adversarial Imitation Learning to Robust Batch Imitation Learning. Bachelor's thesis, Harvard College.
Research Data
Abstract
Imitation learning (IL) aims to learn a behavior policy through imitating the behavior of an expert. While successfully achieving high performance in various domains, IL lacks an established set of evaluation metrics that makes comparing algorithms and identifying their shortcomings difficult. This thesis proposes a suite of evaluation metrics for imitation learning, and benchmarks Behavior Cloning (BC) and Generative Adversarial Imitation Learning (GAIL), two baseline IL algorithms. Our results challenge the consensus that GAIL is favorable to BC, and argue that any perceived gain is due to a non-standard training methodology employed in prior work. In addition, these evaluations discover a shortcoming in both algorithms that has not been adequately addressed. That is, they are susceptible to expert data that consists of a mixture of optimal and degraded trajectories. Due to the noisy nature of expert data, this significantly hampers the usability of IL in the real-world. Building on recent insights from batch reinforcement learning (BIL) as well as self-supervised reward learning, I propose and study a novel batch imitation learning algorithm, Disagreement-Regularized Batch-Constrained-Q Imitation Learning (DRBIL), which learns without any interaction with the environment and is robust to expert data degradation. These properties ensure that DRBIL can learn a good policy without the agent taking risky actions or overfitting to degraded expert trajectories. I instantiate DRBIL in MuJoCo domains and demonstrate state-of-art IL performance as well as robustness to data degradation. Together, this thesis takes an important step forward in making IL rigorous and suggests a new BIL framework that is widely adaptable and satisfies critical safety desiderata.
Description
Other Available Sources
Keywords
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service