Publication:
ObjectiveGAN: Using Generative Adversarial Networks and Reinforcement Learning to Fine-Tune Sequence Generation Models

No Thumbnail Available

Date

2017-07-14

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Research Data

Abstract

When generating sequences with recurrent neural networks, naive reinforcement learning can be used to give ``hints" and guide the model's generative process towards an arbitrary objective criterion on the output data. For example, when generating music one might reward the model for staying within just one key or when generating molecule strings one might want them to be valid chemical compounds. Very often, this type of heuristic can backfire by leading the model to become lazy or greedy, generating uninteresting data and even failing to improve on the given objective. Traditional approaches have tweaked the objective function adding domain-specific penalties and rewards to prevent the model from becoming greedy. While this method has been successful in improving the desired objective, effective reward functions can be hard to craft and rely heavily on domain-specific knowledge. This thesis introduces ObjectiveGAN as a solution to this problem. We employ Generative Adversarial Networks (GANs) to increase the entropy of the generative process and prevent it from being greedy, ultimately improving the objective we are interested in. In contrast with traditional RL methods that depend on carefully crafted heuristics to work well, ObjectiveGAN also works with simple heuristics by adding a dynamic GAN component to the reward function. This GAN component allows the model to maximize the hard coded objective, while maintaining information learned from the training data. We implement ObjectiveGAN in the context of chemistry molecules and show that it can be used to generate a large percentage of new valid molecules that are not present in the training set.

Description

Other Available Sources

Keywords

Computer Science, Chemistry, General

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories