Publication:
Evolution and Computational Generation of Highly Functionalized Nucleic Acid Polymers

No Thumbnail Available

Date

2021-07-12

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Chen, Jonathan Chris. 2021. Evolution and Computational Generation of Highly Functionalized Nucleic Acid Polymers. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

In vitro selection enables the identification of polymers with fundamental activities such as target binding and reaction catalysis. While in vitro selections can begin with combinatorial libraries approaching ~1015 distinct molecules, these initial libraries often lack sufficient chemical functionality to support high on-target activity. Here we describe a ligase-mediated DNA-templated polymerization system with access to 32 building blocks containing eight chemically-diverse side-chains on a DNA backbone. This sequence-defined synthetic polymer system, highly-functionalized nucleic acid polymers (HFNAPs), supports iterated rounds of in vitro selection against protein and small molecule targets, yielding binders for PCSK9, IL-6, and daunomycin, all with nanomolar KD’s. Using high-throughput sequencing (HTS) to gain a deeper understanding of in vitro selection progression, we demonstrate the effectiveness of re-selection and evolution, while improving selection stringency by biasing for slow-off rates. HTS also enables us to examine the role individual side-chains play in binding interactions and translation efficiency. Our neutral selections elucidate translation biases from side-chain inclusion and ligase sequence-preferences. The structure-activity relationship studies reveal that specific side-chains are necessary for binding, and our parallel selections with seven genetic codes demonstrate the critical importance of non-polar side-chains. Despite these successful selection campaigns, in vitro selections sample only a miniscule fraction of the total HFNAP sequence space. Selection-induced sequence convergence and limited sequencing depth further constrain our ability to explore and understand in vitro selection fitness landscapes. Therefore, we trained a conditional variational autoencoder (CVAE) machine learning model on in vitro selection data, to learn the relationship between sequence identity and binding affinity. Remarkably, the trained CVAE generated diverse and novel HFNAP sequences with binding affinities (KD=13-15 nM) similar to those of the most active HFNAPs from in vitro selection, even though the CVAE-generated polymers lack sequence similarity with experimentally identified sequences. Select CVAE-generated polymers have secondary structure conservation with in-vitro-selection-derived polymers, demonstrating the model’s ability to map the global fitness landscape for daunomycin binding. Coupling in vitro selection with a machine learning model that learned the fitness landscape of a binding task thus enabled direct generation of active variants, demonstrating a new approach to the discovery of highly active biopolymers.

Description

Other Available Sources

Keywords

Chemistry

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories