Publication: Evidence for and Applications of Physics-Based Reasoning in AlphaFold
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
The problem of predicting a protein's 3D structure from its primary amino acid sequence is a longstanding challenge in structural biology. Recently, approaches like AlphaFold have achieved remarkable performance on this task by using deep learning techniques to analyze patterns of variation across evolutionarily related protein sequences. The use of such coevolutionary information is critical to these models' accuracy, and without it their predictive performance drops considerably. In living cells, however, the 3D structure of a protein is fully determined by its primary sequence and the biophysical laws that cause it to fold into a low-energy configuration. Thus, it should be possible to predict a protein's structure from only its primary sequence by learning a highly-accurate biophysical energy function. We provide evidence that AlphaFold has learned such an energy function, and uses coevolution data to solve the global search problem of finding a low-energy conformation. We demonstrate that AlphaFold's learned potential function can be used to rank the quality of candidate protein structures with state-of-the-art accuracy, without using any coevolution data. Finally, we explore practical applications of this learned potential function, including predicting protein structures without coevolution data and predicting the effects of mutations on proteins. By iteratively optimizing protein structures using AlphaFold's learned potential function, we are able to create significantly improved protein structure predictions without the use of coevolution information, which represents an important step toward the goal of predicting protein structures from single sequences using physical principles.