Publication:

Accelerating drug discovery with quantum chemistry, machine learning, and molecular dynamics

Loading...
Thumbnail Image

Date

2023-11-21

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Axelrod, Simon. 2023. Accelerating drug discovery with quantum chemistry, machine learning, and molecular dynamics. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Drug discovery is a long and expensive process. Bringing a new drug to market takes an average of 12 years and costs $2.9 billion (USD, 2013). Most new candidates fail during pre-clinical development, and 90% of the remaining compounds fail during clinical trials. To make better drugs at a lower cost, there is a pressing need to accelerate drug discovery and decrease its attrition rate.

The computational approaches of atomistic simulation and machine learning (ML) have emerged as promising tools to accelerate drug discovery. Experimental testing is slow and costly, and the attrition rate of drug candidates is high, so significant experimental resources are spent on non-optimal compounds. ML and simulation can be used to predict the properties of hypothetical drugs before they are tested experimentally. They can therefore be used to screen thousands or millions of drug candidates to find compounds with an ideal set of properties, enabling experimentalists to focus on a small number of promising candidates. While computational tools like docking have long been part of the pharmaceutical development pipeline, they have been limited to early-stage hit discovery because of low accuracy. The explosion of computational power and advances in computational chemistry have recently enabled more accurate simulations to be run at a larger scale. Meanwhile, progress in ML algorithms, software, and hardware has enabled rapid training of accurate ML models. These models can leverage correlations in computational or experimental data that is expensive to generate. Hence they can make rapid predictions for new candidates without intensive first-principles calculations or costly experiments.

In this dissertation we develop new computational tools for accelerating drug discovery, and use them to screen large virtual libraries to identify promising drug candidates. In Chapters 2 and 3 we develop new ML algorithms for predicting protein-ligand binding affinities from experimental data. Previous architectures used the 2D molecular graph or a single 3D molecular geometry as input. These representations neglect the flexibility of the ligand, as encapsulated by its 3D conformer ensemble. To incorporate this flexibility, we use quantum chemistry and molecular dynamics (MD) to generate a dataset of conformer ensembles for over 300,000 drug-like molecules (Chapter 2), and use the ensembles to train new models that incorporate molecular flexibility (Chapter 3).

In Chapters 4 to 7 we focus on photopharmacology, the emerging field of light-activated drugs. By controlling drug activity with light, one can focus biological effects to precise locations or times. This can minimize side effects, thereby increasing the maximum deliverable dose and improving patients’ quality of life. Photopharmacology faces an even more difficult optimization problem than regular drug discovery, since one must also optimize the photophysical properties of drugs and the differential activity of the isomers generated by illumination. This makes computation an attractive tool for accelerating photoactive drug discovery.

In Chapter 4 we develop tools for predicting the efficiency of light-induced isomerization of photoactive drugs. We develop a neural network force field (NFF) trained on multi-reference quantum chemistry data, and combine it with nonadiabatic MD algorithms to predict the isomerization yield of molecules outside the training set. This allows us to predict whether new photoactive drug candidates will isomerize as required under illumination. The NFF also enables us to predict absorption wavelengths, a key quantity in photoactive drug design.

In Chapter 5 we predict the thermal stability of the isomers generated by light. We argue that thermal isomerization of azobenzene, the light-activated core of many photoactive ligands, proceeds through an unusual singlet-triplet-singlet pathway. We incorporate this mechanism into a comprehensive workflow for predicting thermal half-lives of azobenzene derivatives. Combining this workflow with a transferable NFF, we screen tens of thousands of hypothetical molecules based on thermal stability.

In Chapter 6 we turn to biological properties. We use computational docking to elucidate trends in the differential binding affinity of azobenzene derivatives to a set of medically relevant proteins. We also show how docking can be combined with our recently developed tools and graph-to-property ML models to identify promising photoactive ligands.

In Chapter 7 we combine the algorithms of Chapters 4 to 6 to screen a virtual library of 5 million molecules for photoactive inhibition of the PARP1 enzyme. In addition to docking, we incorporate free energy perturbation to more accurately predict the protein-ligand binding affinity of top candidates. We also introduce a workflow to compute pKa values, since proton transfer from water can be used to optimize the absorption wavelength of azobenzene derivatives. We identify several new molecules with favorable chemical, photophysical, and biological properties. These compounds can form the starting point for experimental testing and further refinement.

Altogether, this work presents several computational advances to help accelerate drug discovery and bring photopharmacology to the clinic. We hope that these developments will one day make a small positive impact on human health.

Description

Other Available Sources

Research Data

Keywords

Computational chemistry, Drug discovery, Machine learning, Molecular dynamics, Simulation, Chemistry

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories