Publication:
Nonparametric Methods for Building and Evaluating Models of Biological Sequences

No Thumbnail Available

Date

2023-07-25

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Amin, Alan Nawzad. 2023. Nonparametric Methods for Building and Evaluating Models of Biological Sequences. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

Probabilistic models of biological sequences are used to design drugs, make predictions about human health, and learn basic biology. Sequence data is high dimensional so a probabilistic model must make biological assumptions to predict and infer. However, these assumptions can come at the cost of the flexibility of the model, fundamentally limiting its ability to make accurate predic- tions and learn new biology. Modern sequencing efforts and high-throughput experimentation are generating an ever-increasing amount of sequence data, in principle providing increasing informa- tion to learn the complexity of real sequence data. To leverage this wealth of data this thesis builds nonparametric models and tests of sequences that incorporate biological prior knowledge while re- maining flexible. This theis build methods to perform efficient, flexible, and reliable prediction and inference from DNA and protein data, at large and small scale, and in supervised and unsupervised settings.

Description

Other Available Sources

Keywords

Systematic biology, Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories