Publication:
Towards Inverse Design in Chemistry: Machine Learning Applications in Predictive and Generative Models

No Thumbnail Available

Date

2019-05-21

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Sanchez Lengeling, Benjamin M. 2019. Towards Inverse Design in Chemistry: Machine Learning Applications in Predictive and Generative Models. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Research Data

Abstract

Many of the challenges of the 21st century, from personalized healthcare to energy production and storage, share a common theme: materials are part of the solution. Groundbreaking advances are likely to come from unexplored regions of chemical space. The central challenge is, how do we design molecules and materials according to a desired functionality? Charting the space of all possible molecules in its entirety is computationally intractable, so we need smarter ways of navigating and mining chemical space. Machine learning tools could potentially solve these problems by informing us on how to build computational systems that improve through experience and can generalize from outside a given dataset. In this Ph.D. work, I demonstrate the utility of machine learning to a variety of chemical problems centered around two main themes: 1) building data-driven models for prediction and interpretation of molecular properties and 2) generating and optimizing molecules according to properties via deep generative models. With respect to prediction, I have built models based on Gaussian processes and quantum chemistry calculations in the areas of organic solar cells, solubility parameters and redox potentials for metabolic reactions. Part of the process involves building chemical insights from predictions to elucidate design rules in each application. On generating molecules, I have developed an machine learning algorithm that combines generative adversarial networks and reinforcement learning in the context of drug-like molecules, able to generate molecules biased towards properties of interest. Also, I worked on neural networks that encode molecules as continuous vectors, focusing on developing ways of tracing smooth paths of small chemical changes between any two molecules and optimizing with respect to a target property in this new space. These models are readily deployable in problems with small organic molecules and where we can build predictive models for the property of interest to a sufficient degree. While there is still more progress to be made before achieving the inverse design of molecules, this work represents substantial progress towards this goal.

Description

Other Available Sources

Keywords

Chemistry, Inverse Design, Machine Learning, Generative Models, Gaussian Processes

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories