Publication:

Generative Statistical Methods for Biological Sequences

Loading...
Thumbnail Image

Date

2022-05-11

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Weinstein, Eli Nathan. 2022. Generative Statistical Methods for Biological Sequences. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Measuring and making sequences is central to modern biology and biomedicine. From evolutionary biology to immunology to therapeutics and beyond, scientists collect massive datasets of DNA, RNA and protein sequences, and create new sequences in the laboratory through large-scale DNA synthesis or genome editing. This dissertation is about the problem of learning from measurements of complex sequence data and predicting unobserved or future sequences that can be made in the laboratory. The dissertation describes new generative statistical methods for biological sequences, working within the framework of Bayesian statistics and probabilistic machine learning, and establishes theoretical guarantees on these methods using frequentist analysis. Part I proposes new tools for building biological sequence models, critiquing biological sequence models, and designing experiments to synthesize samples from biological sequence models. Part II deals with the use of misspecified models in biological sequence analysis and beyond, developing a new understanding of how such “wrong” models can be used effectively for estimation and discovery. Overall, the dissertation contributes principles and methods for reliable and accurate prediction, analysis and design of biological sequences across biology and biomedicine.

Description

Other Available Sources

Research Data

Keywords

Biophysics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories