Publication:
Development and validation of computational models for efficient design of biological sequences

No Thumbnail Available

Date

2022-01-10

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Shin, Jung-Eun. 2021. Development and validation of computational models for efficient design of biological sequences. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

There is a huge surge of interest in designing a wide variety of proteins to use as molecular research tools and biotherapeutics - promising to revolutionize our capacity to design what we need at will. This is particularly true in research areas with unmet needs, e.g. antibodies, gene editing, therapeutic delivery, and vaccine development. The opportunity to address these unmet needs arises from two major advances over the last ten years: (i) new high-throughput technologies have been developed to greatly reduce the cost of the reading (sequencing) and writing (synthesis) of DNA sequence, including deep next generation sequencing and massive stochastic synthesis of large libraries; and (ii) major advances in computational methods and power have unlocked access to new scales of data analysis, modeling, inference, and generation. The underlying premise of this thesis is that the now large and ever-increasing sequence diversity allows us to build methods that can learn implicit patterns and rules well enough to design new sequences with similar or improved functions. This sequence diversity we learn from can be natural – from across evolution and immune repertoires, or synthetic – sequenced from selection experiments of enormous stochastic libraries. The computational methods I developed that were most successful are generative and probabilistic models embedded in deep neural networks. The methods developed and validated here in the thesis were inspired on the one hand by the success of generative models in biology in predicting 3D structure and the effects of mutations and on the other hand by the success of natural language models in translation, speech and text generation. In my thesis I present three projects that that address bottlenecks in antibody/nanobody discovery with experimental validation of computational approaches with collaborations and a fourth project which is a more theoretical development of methods to design proteins with specific functionality with concrete applications to examples such as viral viability, protein fluorescence, and enzymatic activity.

Description

Other Available Sources

Keywords

Antibody, Computational biology, Machine learning, Nanobody, Protein design, Biology, Systematic biology, Bioinformatics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories