Publication: Molecular phenotypes from evolutionary sequences: Method development and biological applications
No Thumbnail Available
Open/View Files
Date
2019-05-16
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Green, Anna Gustafson. 2019. Molecular phenotypes from evolutionary sequences: Method development and biological applications. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
Research Data
Abstract
Evolutionary pressure to create and preserve useful traits leaves traces in the genomes of organisms. These traces can be detected and interpreted using statistical methods in order to make inferences about an organism's phenotype and evolutionary history. One such trait is protein-protein interactions, which leave traces detectable as dependencies between residues from two different proteins. Protein-protein interactions underly many important biological processes, and an ability to detect all protein-protein interactions at residue resolution would be an important step in understanding organismal phenotype.
In this thesis I present methodological advances and their accompanying applications to make biological discoveries about protein-protein interactions from evolutionary sequence data, using a method called evolutionary couplings. In chapter 1, I show the first application of evolutionary couplings at a proteome scale, to predict interactions and interfaces for hundreds of proteins in the Escherichia coli genome, emphasizing proteins that cannot be studied with current experimental methods. I also introduce the largest non-redundant dataset of protein-protein interactions with known structures to date, and present this as a resource for future analysis and method development. In chapter 2, I detail a comprehensive and user-friendly command line application built to allow utilization of evolutionary couplings analysis by non-advanced users, and a Python package built to facilitate development of new functionality. In chapter 3, I use a statistical model based on pairwise amino acid preferences to analyze features of specificity in the eukaryotic protocadherin superfamily. In chapter 4, I predict and analyze structures in the bacterial elongasome and divisome, demonstrating fine biological details and testable hypotheses that can be learned by these computational methodologies. In this thesis, I have sought to both develop and apply computational methods, and facilitate the use of these methods for discovery in molecular biology.
Description
Other Available Sources
Keywords
computational biology, genomics, structural biology, evolutionary biology, microbiology
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service