Complex Forms of Structural Variation in the Human Genome: Haplotypes, Evolution, and Relationship to Disease
MetadataShow full item record
CitationBoettger, Linda M. 2015. Complex Forms of Structural Variation in the Human Genome: Haplotypes, Evolution, and Relationship to Disease. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
AbstractGenomic mutations arise in many forms, varying from single base pair substitutions to complicated sets of overlapping copy number variants (CNVs). While each type of variation contributes to phenotype, complex structural variation, which contains multiple mutations, is difficult to type across many individuals and is largely omitted from genomic studies. This thesis presents methods to type complex structural variation, understand how it evolves, and integrate these complex variants into association studies to phenotypes.
We focused on four structurally complex regions in the human genome. The 17q21.31 region contains an inversion, previously uncharacterized overlapping copy number variants, and SNPs that associate to the female meiotic recombination rate and female fertility1. The haptoglobin (HP) gene at chromosome 16q22.2 contains a 1.7 kb tandem duplication2, previously uncharacterized paralogous gene conversion, and nearby SNPs that associate to cholesterol levels3. The haptoglobin related gene (HPR) at chromosome 16q22.2, segregates as a multi-allelic copy number variant (mCNV) specifically in African populations. Lastly, complement component 4 (C4) at chromosome 6p21.3, contains a length polymorphism, paralogous sequence variation, and copy number variation segregating in humans and non-human primates4.
We developed methods to characterize the complex structural variation in each of these four regions, type the variation at the population level and integrate it into association studies. Briefly, we determined the breakpoints of each individual structural variant, typed each variant in a population cohort, and learned which variants segregate together through trio inheritance patterns. Once these structural haplotypes were defined, we phased them with surrounding SNP haplotypes and used this data as a reference panel for imputation into disease cohorts, and to better understand their evolutionary history.
We found that two overlapping duplications in the 17q21.31 region rose rapidly and independently to high frequency within European populations, and may account for the regional association to female fertility and the female meiotic recombination rate. We also found that a recurrent deletion in the HP gene associates to total cholesterol and LDL cholesterol levels. The methods developed in this thesis enable the integration of structurally complex variation into future association studies so that we can begin to understand their effects on phenotypes.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:14226090
- FAS Theses and Dissertations