Predicting the Effects of Missense Variation on Protein Structure, Function, and Evolution
MetadataShow full item record
CitationJordan, Daniel Michael. 2015. Predicting the Effects of Missense Variation on Protein Structure, Function, and Evolution. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
AbstractEstimating the effects of missense mutations is a problem with many important applications in a variety of fields, including medical genetics, evolutionary theory, population genetics, and protein structure and design. Many popular methods exist to solve this problem, the most widely used of which are PolyPhen-2 and SIFT. These methods, along with most other popular methods, rely on multiple sequence alignments of orthologous protein sequences. Based on the amino acids observed in each column of the alignment, they produce a profile describing how tolerated each amino acid is at each position. They then compare the wild-type and variant amino acids to this profile to produce a prediction.
In practice, these methods are fast, robust, and relatively reliable. However, from a theoretical perspective, they have at least three significant shortcomings:
1. They use effects on selection as a proxy for effects on phenotype and protein structure and function.
2. They treat each position as independent, ruling out most forms of interactions between sites.
3. They do not explicitly model the process of evolution, instead assuming that sequences we observe more or less represent an equilibrium state.
With the recent explosion of sequencing technology, as well as the steady increase of computational power, we are now beginning to have enough data to investigate these simplifications and see how much they really affect the performance of these methods.
In this dissertation, I present three such investigations. First, I describe a modified predictor designed to predict risk for a specific disease, hypertrophic cardiomyopathy (HCM), rather than general seletive effect. This method achieves significantly higher accuracy than methods without such specific domain knowledge. Next, I describe a model of pairwise interactions between sites, demonstrating both statistically and with in vivo evidence that approximately 7-12% of disease-causing variants may be mispredicted by these methods due to such interactions. Finally, I describe a hybrid method that uses an alignment-based estimator to inform a parametric model of evolution, resulting in a small but significant improvement in accuracy.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:17464216
- FAS Theses and Dissertations