Development and validation of computational models for efficient design of biological sequences
MetadataShow full item record
CitationShin, Jung-Eun. 2021. Development and validation of computational models for efficient design of biological sequences. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
AbstractThere is a huge surge of interest in designing a wide variety of proteins to use as molecular research tools and biotherapeutics - promising to revolutionize our capacity to design what we need at will. This is particularly true in research areas with unmet needs, e.g. antibodies, gene editing, therapeutic delivery, and vaccine development. The opportunity to address these unmet needs arises from two major advances over the last ten years: (i) new high-throughput technologies have been developed to greatly reduce the cost of the reading (sequencing) and writing (synthesis) of DNA sequence, including deep next generation sequencing and massive stochastic synthesis of large libraries; and (ii) major advances in computational methods and power have unlocked access to new scales of data analysis, modeling, inference, and generation.
The underlying premise of this thesis is that the now large and ever-increasing sequence diversity allows us to build methods that can learn implicit patterns and rules well enough to design new sequences with similar or improved functions. This sequence diversity we learn from can be natural – from across evolution and immune repertoires, or synthetic – sequenced from selection experiments of enormous stochastic libraries. The computational methods I developed that were most successful are generative and probabilistic models embedded in deep neural networks. The methods developed and validated here in the thesis were inspired on the one hand by the success of generative models in biology in predicting 3D structure and the effects of mutations and on the other hand by the success of natural language models in translation, speech and text generation.
In my thesis I present three projects that that address bottlenecks in antibody/nanobody discovery with experimental validation of computational approaches with collaborations and a fourth project which is a more theoretical development of methods to design proteins with specific functionality with concrete applications to examples such as viral viability, protein fluorescence, and enzymatic activity.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37371128
- FAS Theses and Dissertations