Publication:

Molecular Property Predictors for Downstream De Novo Generation

Loading...
Thumbnail Image

Date

2022-05-23

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Tekur, Varun. 2022. Molecular Property Predictors for Downstream De Novo Generation. Bachelor's thesis, Harvard College.

Abstract

The task of de novo molecular generation involves creating new molecular structures in order to optimize a certain objective, and it is extremely useful in the context of drug discovery. A key challenge in this task is that in many cases, it is not possible to frequently perform ground-truth evaluation of molecules because this evaluation must be done in a wet lab and is therefore resource intensive. In this thesis, I propose a de novo generation method that first trains a Graph Neural Network predictor for the property to optimize and then uses this predictor as a scoring function (instead of ground truth scoring) in a Graph-Based Genetic Algorithm generation method. I use batch Bayesian Optimization to create the training dataset for the predictor, and do so iteratively and with realistically sized batches. This training dataset also provides high quality molecules to use as a starting point for the Genetic Algorithm. I evaluate this method on the task of generating molecules that dock well to the human Dopamine Receptor D3 and observe that using Bayesian Optimization to create the training dataset frequently leads to better top molecules generated by the Genetic Algorithm on average compared to using random dataset creation. Additionally, the top molecules generated by the Genetic Algorithm are on average better than the top molecules in the training dataset of the predictor. This highlights the utility of the Genetic Algorithm as an additional optimization step after Bayesian Optimization.

Description

Other Available Sources

Research Data

Keywords

Bayesian Optimization, Graph Neural Networks, Molecule Generation, Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories