Extending the Reach of Coevolution-Based Protein Residue-Residue Contact Prediction
Abstract
The three-dimensional conformations of macromolecules are critical to their diverse functions; however, many important proteins have as yet undetermined structures. Structural states are constrained by residue-residue interactions, leaving imprints of evolutionary selection – by decrypting statistical relationships embedded as covarying positions within related sequences we can infer structural constraints on the encoded macromolecules. Despite advances in inferring causal interactions from noisy covariations, many proteins remain out of reach due to methodological limitations and available data. Thus, for many proteins of interest, novel methods are necessary to extract meaningful structural information from sequence covariation.Here we present two approaches enabling covariation-based contact prediction for previously unresolvable sequences. First, we develop a systematic covariation-based contact prediction method for intrinsically disordered regions (IDRs). While IDRs do not form stable globular structures, many do occupy functional conformations transiently or under specific conditions. Despite being invisible to most methods, these functional states are still constrained by evolutionary pressure and leave their own signal of covariation. In practice IDRs provide unique challenges for coupling inference – here we develop the coevolution-based approach to address these challenges and validate it on a set of flexible and disordered proteins with experimental evidence for structural states. We then apply our method to a large set of disordered regions within the human proteome, finding that many predicted IDRs show constraints indicating a propensity for structural states.
Second, we introduce DeepContact, a machine learning approach to enhance coevolution-based contact predictions, particularly for proteins with few homologous sequences. Previous covariation-based methods have not used structural knowledge from the tens of thousands of determined protein structures - we incorporate this knowledge by training a convolutional neural network to predict contacts from the often-noisy inferred couplings. Using solved structures and covariation, we in effect learn coupling and constraint motifs, incorporating them to improve subsequent novel predictions. DeepContact significantly improves the precision/recall performance for contact prediction while also learning the patterns of evolutionary constraint underlying protein structures and facilitating deeper investigation into the evolutionary rules of protein structure.
These methods expand the applicability of coevolution-based structure prediction, enabling the discovery of many previously unobservable structural states.
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
http://nrs.harvard.edu/urn-3:HUL.InstRepos:39947169
Collections
- FAS Theses and Dissertations [5858]
Contact administrator regarding this item (to report mistakes or request changes)