Publication: Nonparametric Methods for Building and Evaluating Models of Biological Sequences
No Thumbnail Available
Open/View Files
Date
2023-07-25
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Amin, Alan Nawzad. 2023. Nonparametric Methods for Building and Evaluating Models of Biological Sequences. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
Probabilistic models of biological sequences are used to design drugs, make predictions about human health, and learn basic biology. Sequence data is high dimensional so a probabilistic model must make biological assumptions to predict and infer. However, these assumptions can come at the cost of the flexibility of the model, fundamentally limiting its ability to make accurate predic- tions and learn new biology. Modern sequencing efforts and high-throughput experimentation are generating an ever-increasing amount of sequence data, in principle providing increasing informa- tion to learn the complexity of real sequence data. To leverage this wealth of data this thesis builds nonparametric models and tests of sequences that incorporate biological prior knowledge while re- maining flexible. This theis build methods to perform efficient, flexible, and reliable prediction and inference from DNA and protein data, at large and small scale, and in supervised and unsupervised settings.
Description
Other Available Sources
Keywords
Systematic biology, Statistics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service