From Adversarial Imitation Learning to Robust Batch Imitation Learning
Ma, Yecheng Jason
MetadataShow full item record
CitationMa, Yecheng Jason. 2020. From Adversarial Imitation Learning to Robust Batch Imitation Learning. Bachelor's thesis, Harvard College.
AbstractImitation learning (IL) aims to learn a behavior policy through imitating the behavior of an expert. While successfully achieving high performance in various domains, IL lacks an established set of evaluation metrics that makes comparing algorithms and identifying their shortcomings difficult. This thesis proposes a suite of evaluation metrics for imitation learning, and benchmarks Behavior Cloning (BC) and Generative Adversarial Imitation Learning (GAIL), two baseline IL algorithms. Our results challenge the consensus that GAIL is favorable to BC, and argue that any perceived gain is due to a non-standard training methodology employed in prior work. In addition, these evaluations discover a shortcoming in both algorithms that has not been adequately addressed. That is, they are susceptible to expert data that consists of a mixture of optimal and degraded trajectories. Due to the noisy nature of expert data, this significantly hampers the usability of IL in the real-world. Building on recent insights from batch reinforcement learning (BIL) as well as self-supervised reward learning, I propose and study a novel batch imitation learning algorithm, Disagreement-Regularized Batch-Constrained-Q Imitation Learning (DRBIL), which learns without any interaction with the environment and is robust to expert data degradation. These properties ensure that DRBIL can learn a good policy without the agent taking risky actions or overfitting to degraded expert trajectories. I instantiate DRBIL in MuJoCo domains and demonstrate state-of-art IL performance as well as robustness to data degradation. Together, this thesis takes an important step forward in making IL rigorous and suggests a new BIL framework that is widely adaptable and satisfies critical safety desiderata.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364689
- FAS Theses and Dissertations