Publication: ObjectiveGAN: Using Generative Adversarial Networks and Reinforcement Learning to Fine-Tune Sequence Generation Models
No Thumbnail Available
Date
2017-07-14
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Research Data
Abstract
When generating sequences with recurrent neural networks, naive reinforcement learning can be used to give ``hints" and guide the model's generative process towards an arbitrary objective criterion on the output data. For example, when generating music one might reward the model for staying within just one key or when generating molecule strings one might want them to be valid chemical compounds.
Very often, this type of heuristic can backfire by leading the model to become lazy or greedy, generating uninteresting data and even failing to improve on the given objective. Traditional approaches have tweaked the objective function adding domain-specific penalties and rewards to prevent the model from becoming greedy. While this method has been successful in improving the desired objective, effective reward functions can be hard to craft and rely heavily on domain-specific knowledge.
This thesis introduces ObjectiveGAN as a solution to this problem. We employ Generative Adversarial Networks (GANs) to increase the entropy of the generative process and prevent it from being greedy, ultimately improving the objective we are interested in. In contrast with traditional RL methods that depend on carefully crafted heuristics to work well, ObjectiveGAN also works with simple heuristics by adding a dynamic GAN component to the reward function. This GAN component allows the model to maximize the hard coded objective, while maintaining information learned from the training data.
We implement ObjectiveGAN in the context of chemistry molecules and show that it can be used to generate a large percentage of new valid molecules that are not present in the training set.
Description
Other Available Sources
Keywords
Computer Science, Chemistry, General
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service