Evolution and Computational Generation of Highly Functionalized Nucleic Acid Polymers
Jon C Chen Thesis combined final.pdf (29.76Mb)
Access StatusFull text of the requested work is not available in DASH at this time ("dark deposit"). For more information on dark deposits, see our FAQ.
Chen, Jonathan Chris
MetadataShow full item record
CitationChen, Jonathan Chris. 2021. Evolution and Computational Generation of Highly Functionalized Nucleic Acid Polymers. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
AbstractIn vitro selection enables the identification of polymers with fundamental activities such as target binding and reaction catalysis. While in vitro selections can begin with combinatorial libraries approaching ~1015 distinct molecules, these initial libraries often lack sufficient chemical functionality to support high on-target activity. Here we describe a ligase-mediated DNA-templated polymerization system with access to 32 building blocks containing eight chemically-diverse side-chains on a DNA backbone. This sequence-defined synthetic polymer system, highly-functionalized nucleic acid polymers (HFNAPs), supports iterated rounds of in vitro selection against protein and small molecule targets, yielding binders for PCSK9, IL-6, and daunomycin, all with nanomolar KD’s. Using high-throughput sequencing (HTS) to gain a deeper understanding of in vitro selection progression, we demonstrate the effectiveness of re-selection and evolution, while improving selection stringency by biasing for slow-off rates.
HTS also enables us to examine the role individual side-chains play in binding interactions and translation efficiency. Our neutral selections elucidate translation biases from side-chain inclusion and ligase sequence-preferences. The structure-activity relationship studies reveal that specific side-chains are necessary for binding, and our parallel selections with seven genetic codes demonstrate the critical importance of non-polar side-chains.
Despite these successful selection campaigns, in vitro selections sample only a miniscule fraction of the total HFNAP sequence space. Selection-induced sequence convergence and limited sequencing depth further constrain our ability to explore and understand in vitro selection fitness landscapes. Therefore, we trained a conditional variational autoencoder (CVAE) machine learning model on in vitro selection data, to learn the relationship between sequence identity and binding affinity. Remarkably, the trained CVAE generated diverse and novel HFNAP sequences with binding affinities (KD=13-15 nM) similar to those of the most active HFNAPs from in vitro selection, even though the CVAE-generated polymers lack sequence similarity with experimentally identified sequences. Select CVAE-generated polymers have secondary structure conservation with in-vitro-selection-derived polymers, demonstrating the model’s ability to map the global fitness landscape for daunomycin binding. Coupling in vitro selection with a machine learning model that learned the fitness landscape of a binding task thus enabled direct generation of active variants, demonstrating a new approach to the discovery of highly active biopolymers.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37368339
- FAS Theses and Dissertations