Publication:
From Adversarial Imitation Learning to Robust Batch Imitation Learning

No Thumbnail Available

Date

2020-06-17

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Ma, Yecheng Jason. 2020. From Adversarial Imitation Learning to Robust Batch Imitation Learning. Bachelor's thesis, Harvard College.

Research Data

Abstract

Imitation learning (IL) aims to learn a behavior policy through imitating the behavior of an expert. While successfully achieving high performance in various domains, IL lacks an established set of evaluation metrics that makes comparing algorithms and identifying their shortcomings difficult. This thesis proposes a suite of evaluation metrics for imitation learning, and benchmarks Behavior Cloning (BC) and Generative Adversarial Imitation Learning (GAIL), two baseline IL algorithms. Our results challenge the consensus that GAIL is favorable to BC, and argue that any perceived gain is due to a non-standard training methodology employed in prior work. In addition, these evaluations discover a shortcoming in both algorithms that has not been adequately addressed. That is, they are susceptible to expert data that consists of a mixture of optimal and degraded trajectories. Due to the noisy nature of expert data, this significantly hampers the usability of IL in the real-world. Building on recent insights from batch reinforcement learning (BIL) as well as self-supervised reward learning, I propose and study a novel batch imitation learning algorithm, Disagreement-Regularized Batch-Constrained-Q Imitation Learning (DRBIL), which learns without any interaction with the environment and is robust to expert data degradation. These properties ensure that DRBIL can learn a good policy without the agent taking risky actions or overfitting to degraded expert trajectories. I instantiate DRBIL in MuJoCo domains and demonstrate state-of-art IL performance as well as robustness to data degradation. Together, this thesis takes an important step forward in making IL rigorous and suggests a new BIL framework that is widely adaptable and satisfies critical safety desiderata.

Description

Other Available Sources

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories