Publication:

Complex Forms of Structural Variation in the Human Genome: Haplotypes, Evolution, and Relationship to Disease

Loading...
Thumbnail Image

Date

2015-01-15

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Boettger, Linda M. 2015. Complex Forms of Structural Variation in the Human Genome: Haplotypes, Evolution, and Relationship to Disease. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Abstract

Genomic mutations arise in many forms, varying from single base pair substitutions to complicated sets of overlapping copy number variants (CNVs). While each type of variation contributes to phenotype, complex structural variation, which contains multiple mutations, is difficult to type across many individuals and is largely omitted from genomic studies. This thesis presents methods to type complex structural variation, understand how it evolves, and integrate these complex variants into association studies to phenotypes. We focused on four structurally complex regions in the human genome. The 17q21.31 region contains an inversion, previously uncharacterized overlapping copy number variants, and SNPs that associate to the female meiotic recombination rate and female fertility1. The haptoglobin (HP) gene at chromosome 16q22.2 contains a 1.7 kb tandem duplication2, previously uncharacterized paralogous gene conversion, and nearby SNPs that associate to cholesterol levels3. The haptoglobin related gene (HPR) at chromosome 16q22.2, segregates as a multi-allelic copy number variant (mCNV) specifically in African populations. Lastly, complement component 4 (C4) at chromosome 6p21.3, contains a length polymorphism, paralogous sequence variation, and copy number variation segregating in humans and non-human primates4. We developed methods to characterize the complex structural variation in each of these four regions, type the variation at the population level and integrate it into association studies. Briefly, we determined the breakpoints of each individual structural variant, typed each variant in a population cohort, and learned which variants segregate together through trio inheritance patterns. Once these structural haplotypes were defined, we phased them with surrounding SNP haplotypes and used this data as a reference panel for imputation into disease cohorts, and to better understand their evolutionary history. We found that two overlapping duplications in the 17q21.31 region rose rapidly and independently to high frequency within European populations, and may account for the regional association to female fertility and the female meiotic recombination rate. We also found that a recurrent deletion in the HP gene associates to total cholesterol and LDL cholesterol levels. The methods developed in this thesis enable the integration of structurally complex variation into future association studies so that we can begin to understand their effects on phenotypes.

Description

Other Available Sources

Research Data

Keywords

Biology, Genetics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories