Publication: ObjectiveGAN: Using Generative Adversarial Networks and Reinforcement Learning to Fine-Tune Sequence Generation Models
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
When generating sequences with recurrent neural networks, naive reinforcement learning can be used to give ``hints" and guide the model's generative process towards an arbitrary objective criterion on the output data. For example, when generating music one might reward the model for staying within just one key or when generating molecule strings one might want them to be valid chemical compounds. Very often, this type of heuristic can backfire by leading the model to become lazy or greedy, generating uninteresting data and even failing to improve on the given objective. Traditional approaches have tweaked the objective function adding domain-specific penalties and rewards to prevent the model from becoming greedy. While this method has been successful in improving the desired objective, effective reward functions can be hard to craft and rely heavily on domain-specific knowledge. This thesis introduces ObjectiveGAN as a solution to this problem. We employ Generative Adversarial Networks (GANs) to increase the entropy of the generative process and prevent it from being greedy, ultimately improving the objective we are interested in. In contrast with traditional RL methods that depend on carefully crafted heuristics to work well, ObjectiveGAN also works with simple heuristics by adding a dynamic GAN component to the reward function. This GAN component allows the model to maximize the hard coded objective, while maintaining information learned from the training data. We implement ObjectiveGAN in the context of chemistry molecules and show that it can be used to generate a large percentage of new valid molecules that are not present in the training set.