Show simple item record

dc.contributor.advisorMarks, Debora S
dc.contributor.authorShin, Jung-Eun
dc.date.accessioned2022-03-18T04:15:39Z
dc.date.created2022
dc.date.issued2022-01-10
dc.date.submitted2022-03
dc.identifier.citationShin, Jung-Eun. 2021. Development and validation of computational models for efficient design of biological sequences. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
dc.identifier.other28962994
dc.identifier.urihttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37371128*
dc.description.abstractThere is a huge surge of interest in designing a wide variety of proteins to use as molecular research tools and biotherapeutics - promising to revolutionize our capacity to design what we need at will. This is particularly true in research areas with unmet needs, e.g. antibodies, gene editing, therapeutic delivery, and vaccine development. The opportunity to address these unmet needs arises from two major advances over the last ten years: (i) new high-throughput technologies have been developed to greatly reduce the cost of the reading (sequencing) and writing (synthesis) of DNA sequence, including deep next generation sequencing and massive stochastic synthesis of large libraries; and (ii) major advances in computational methods and power have unlocked access to new scales of data analysis, modeling, inference, and generation. The underlying premise of this thesis is that the now large and ever-increasing sequence diversity allows us to build methods that can learn implicit patterns and rules well enough to design new sequences with similar or improved functions. This sequence diversity we learn from can be natural – from across evolution and immune repertoires, or synthetic – sequenced from selection experiments of enormous stochastic libraries. The computational methods I developed that were most successful are generative and probabilistic models embedded in deep neural networks. The methods developed and validated here in the thesis were inspired on the one hand by the success of generative models in biology in predicting 3D structure and the effects of mutations and on the other hand by the success of natural language models in translation, speech and text generation. In my thesis I present three projects that that address bottlenecks in antibody/nanobody discovery with experimental validation of computational approaches with collaborations and a fourth project which is a more theoretical development of methods to design proteins with specific functionality with concrete applications to examples such as viral viability, protein fluorescence, and enzymatic activity.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dash.licenseLAA
dc.subjectAntibody
dc.subjectComputational biology
dc.subjectMachine learning
dc.subjectNanobody
dc.subjectProtein design
dc.subjectBiology
dc.subjectSystematic biology
dc.subjectBioinformatics
dc.titleDevelopment and validation of computational models for efficient design of biological sequences
dc.typeThesis or Dissertation
dash.depositing.authorShin, Jung-Eun
dc.date.available2022-03-18T04:15:39Z
thesis.degree.date2021
thesis.degree.grantorHarvard University Graduate School of Arts and Sciences
thesis.degree.levelDoctoral
thesis.degree.namePh.D.
dc.contributor.committeeMemberMitchison, Timothy
dc.contributor.committeeMemberZitnik, Marinka
dc.contributor.committeeMemberCepko, Constance
dc.type.materialtext
thesis.degree.departmentSystems Biology
dc.identifier.orcid0000-0002-0039-7373
dash.author.emailjes.june@gmail.com


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record