Predicting the Effects of Missense Variation on Protein Structure, Function, and Evolution

DSpace/Manakin Repository

Predicting the Effects of Missense Variation on Protein Structure, Function, and Evolution

Citable link to this page

 

 
Title: Predicting the Effects of Missense Variation on Protein Structure, Function, and Evolution
Author: Jordan, Daniel Michael ORCID  0000-0002-5318-8225
Citation: Jordan, Daniel Michael. 2015. Predicting the Effects of Missense Variation on Protein Structure, Function, and Evolution. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
Full Text & Related Files:
Abstract: Estimating the effects of missense mutations is a problem with many important applications in a variety of fields, including medical genetics, evolutionary theory, population genetics, and protein structure and design. Many popular methods exist to solve this problem, the most widely used of which are PolyPhen-2 and SIFT. These methods, along with most other popular methods, rely on multiple sequence alignments of orthologous protein sequences. Based on the amino acids observed in each column of the alignment, they produce a profile describing how tolerated each amino acid is at each position. They then compare the wild-type and variant amino acids to this profile to produce a prediction.

In practice, these methods are fast, robust, and relatively reliable. However, from a theoretical perspective, they have at least three significant shortcomings:
1. They use effects on selection as a proxy for effects on phenotype and protein structure and function.
2. They treat each position as independent, ruling out most forms of interactions between sites.
3. They do not explicitly model the process of evolution, instead assuming that sequences we observe more or less represent an equilibrium state.

With the recent explosion of sequencing technology, as well as the steady increase of computational power, we are now beginning to have enough data to investigate these simplifications and see how much they really affect the performance of these methods.

In this dissertation, I present three such investigations. First, I describe a modified predictor designed to predict risk for a specific disease, hypertrophic cardiomyopathy (HCM), rather than general seletive effect. This method achieves significantly higher accuracy than methods without such specific domain knowledge. Next, I describe a model of pairwise interactions between sites, demonstrating both statistically and with in vivo evidence that approximately 7-12% of disease-causing variants may be mispredicted by these methods due to such interactions. Finally, I describe a hybrid method that uses an alignment-based estimator to inform a parametric model of evolution, resulting in a small but significant improvement in accuracy.
Terms of Use: This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
Citable link to this page: http://nrs.harvard.edu/urn-3:HUL.InstRepos:17464216
Downloads of this work:

Show full Dublin Core record

This item appears in the following Collection(s)

 
 

Search DASH


Advanced Search
 
 

Submitters