Publication:

A mutation rate model at the basepair resolution and selection metric using deviation of multinomial site-frequency spectrum

Loading...
Thumbnail Image

Date

2025-01-17

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Lee, Daniel Jaewon. 2025. A mutation rate model at the basepair resolution and selection metric using deviation of multinomial site-frequency spectrum. Doctoral Dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Finding mutations under negative selection is a problem that has important applications in evolutionary theory, population genetics, and rare disease research. An important concept in determining whether a mutation is under strong negative selection is mutation-selection balance, an equilibrium in the number of deleterious alleles in a population that is reached by the two opposing forces of introduction by mutations and elimination by selection. With the recent explosion of sequencing data, we can now approach the deacades-old concept with novel methods. In this Dissertation, I present two such methods. First, I describe Roulette, a genome- wide mutation rate model at basepair resolution that incorporates known determinants of local mutation rate. Roulette is shown to be more accurate than previous models. Roulette is used for various applications, such as refining the estimate of recent population growth and finding novel mutational mechanisms. Second, I introduce multiSFS, a new statistic for estimating genomic regions under negative selection. While traditional methods have focused on using numbers of segregating sites, multiSFS uses the deviation of the site frequency spectrum (SFS) to estimate genomic regions under negative selection. MultiSFS demonstrates enhanced power in simulated data and have increased enrichment of coding sequence regions and higher accuracy in predicting for pathogenic variants.

Description

Other Available Sources

Research Data

Keywords

human genetics, mutation rate, population genetics, selection, Bioinformatics, Genetics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories