Dissecting Protein Fitness Landscapes Using DNA Synthesis and Sequencing: Case Studies in AAV and Beyond
Citation
Ogden, Pierce. 2019. Dissecting Protein Fitness Landscapes Using DNA Synthesis and Sequencing: Case Studies in AAV and Beyond. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.Abstract
Over the past decade, there has been an explosion of data and information regarding protein sequence and function. However, despite all this knowledge, our ability to determine the effect of even a single mutation on protein function remains limited. Protein mutation effects are challenging to predict primarily due to the sheer complexity of proteins, which grows at \(20^n \), where n is the length of the protein. Thus, integrative massively parallel protein synthesis as well as multiplex functional measurements are needed to fully decipher the protein functional landscape.In this dissertation, I dissect protein function using three systems: the Adeno-associated virus (AAV) capsid, the green fluorescent protein (GFP), and antibody scaffolds. First, to improve our understanding of the AAV capsid fitness landscape, I characterize all possible single codon substitutions, insertions, and deletions of the AAV2 capsid gene (91,875 mutations) across multiple functions relevant for in vivo delivery. My analysis reveals unifying capsid design principles and the presence of an uncharacterized viral gene: a frameshifted ORF in the VP1 region that expresses a membrane-associated accessory protein (MAAP). Furthermore, I determine that this new protein limits the production of other AAVs through competitive exclusion.
Second, I measure thousands of combinatorial mutations to the green fluorescent protein (GFP) and investigate our ability to predict those mutations using machine learning. Through novel experimental and computational methods, I am able to predict the effect of mutations and determine a method for protein function exploration across distant sequence space. Finally, I develop a method for screening interactions between libraries of two gene elements; I utilize it to screen therapeutically relevant protein binding motifs against thousands of proteins from the human proteome. Taken together, this thesis lays the groundwork for making the high-throughput measurements needed to dissect a protein’s function, enabling a future of complex protein design.
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
http://nrs.harvard.edu/urn-3:HUL.InstRepos:42013035
Collections
- FAS Theses and Dissertations [5858]
Contact administrator regarding this item (to report mistakes or request changes)