Molecular phenotypes from evolutionary sequences: Method development and biological applications
Green, Anna Gustafson
MetadataShow full item record
CitationGreen, Anna Gustafson. 2019. Molecular phenotypes from evolutionary sequences: Method development and biological applications. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
AbstractEvolutionary pressure to create and preserve useful traits leaves traces in the genomes of organisms. These traces can be detected and interpreted using statistical methods in order to make inferences about an organism's phenotype and evolutionary history. One such trait is protein-protein interactions, which leave traces detectable as dependencies between residues from two different proteins. Protein-protein interactions underly many important biological processes, and an ability to detect all protein-protein interactions at residue resolution would be an important step in understanding organismal phenotype.
In this thesis I present methodological advances and their accompanying applications to make biological discoveries about protein-protein interactions from evolutionary sequence data, using a method called evolutionary couplings. In chapter 1, I show the first application of evolutionary couplings at a proteome scale, to predict interactions and interfaces for hundreds of proteins in the Escherichia coli genome, emphasizing proteins that cannot be studied with current experimental methods. I also introduce the largest non-redundant dataset of protein-protein interactions with known structures to date, and present this as a resource for future analysis and method development. In chapter 2, I detail a comprehensive and user-friendly command line application built to allow utilization of evolutionary couplings analysis by non-advanced users, and a Python package built to facilitate development of new functionality. In chapter 3, I use a statistical model based on pairwise amino acid preferences to analyze features of specificity in the eukaryotic protocadherin superfamily. In chapter 4, I predict and analyze structures in the bacterial elongasome and divisome, demonstrating fine biological details and testable hypotheses that can be learned by these computational methodologies. In this thesis, I have sought to both develop and apply computational methods, and facilitate the use of these methods for discovery in molecular biology.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:42029639
- FAS Theses and Dissertations