Publication:
Dissecting Protein Fitness Landscapes Using DNA Synthesis and Sequencing: Case Studies in AAV and Beyond

No Thumbnail Available

Date

2019-09-11

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Ogden, Pierce. 2019. Dissecting Protein Fitness Landscapes Using DNA Synthesis and Sequencing: Case Studies in AAV and Beyond. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Research Data

Abstract

Over the past decade, there has been an explosion of data and information regarding protein sequence and function. However, despite all this knowledge, our ability to determine the effect of even a single mutation on protein function remains limited. Protein mutation effects are challenging to predict primarily due to the sheer complexity of proteins, which grows at \(20^n \), where n is the length of the protein. Thus, integrative massively parallel protein synthesis as well as multiplex functional measurements are needed to fully decipher the protein functional landscape. In this dissertation, I dissect protein function using three systems: the Adeno-associated virus (AAV) capsid, the green fluorescent protein (GFP), and antibody scaffolds. First, to improve our understanding of the AAV capsid fitness landscape, I characterize all possible single codon substitutions, insertions, and deletions of the AAV2 capsid gene (91,875 mutations) across multiple functions relevant for in vivo delivery. My analysis reveals unifying capsid design principles and the presence of an uncharacterized viral gene: a frameshifted ORF in the VP1 region that expresses a membrane-associated accessory protein (MAAP). Furthermore, I determine that this new protein limits the production of other AAVs through competitive exclusion. Second, I measure thousands of combinatorial mutations to the green fluorescent protein (GFP) and investigate our ability to predict those mutations using machine learning. Through novel experimental and computational methods, I am able to predict the effect of mutations and determine a method for protein function exploration across distant sequence space. Finally, I develop a method for screening interactions between libraries of two gene elements; I utilize it to screen therapeutically relevant protein binding motifs against thousands of proteins from the human proteome. Taken together, this thesis lays the groundwork for making the high-throughput measurements needed to dissect a protein’s function, enabling a future of complex protein design.

Description

Other Available Sources

Keywords

AAV, Synthetic biology, machine learning, biologics, binders, GFP, protein engineering

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories