Publication: Molecular Property Predictors for Downstream De Novo Generation
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
The task of de novo molecular generation involves creating new molecular structures in order to optimize a certain objective, and it is extremely useful in the context of drug discovery. A key challenge in this task is that in many cases, it is not possible to frequently perform ground-truth evaluation of molecules because this evaluation must be done in a wet lab and is therefore resource intensive. In this thesis, I propose a de novo generation method that first trains a Graph Neural Network predictor for the property to optimize and then uses this predictor as a scoring function (instead of ground truth scoring) in a Graph-Based Genetic Algorithm generation method. I use batch Bayesian Optimization to create the training dataset for the predictor, and do so iteratively and with realistically sized batches. This training dataset also provides high quality molecules to use as a starting point for the Genetic Algorithm. I evaluate this method on the task of generating molecules that dock well to the human Dopamine Receptor D3 and observe that using Bayesian Optimization to create the training dataset frequently leads to better top molecules generated by the Genetic Algorithm on average compared to using random dataset creation. Additionally, the top molecules generated by the Genetic Algorithm are on average better than the top molecules in the training dataset of the predictor. This highlights the utility of the Genetic Algorithm as an additional optimization step after Bayesian Optimization.