Publication: Large multi-allelic copy number variations in humans
Open/View Files
Date
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Thousands of genome segments appear to be present in widely varying copy number in different human genomes. We developed ways to use increasingly abundant whole genome sequence data to identify the copy numbers, alleles and haplotypes present at most large, multi-allelic CNVs (mCNVs). We analyzed 849 genomes sequenced by the 1000 Genomes Project to identify most large (>5 kb) mCNVs, including 3,878 duplications, of which 1,356 appear to have three or more segregating alleles. We find that mCNVs give rise to most human gene-dosage variation – exceeding sevenfold the contribution of deletions and biallelic duplications – and that this variation in gene dosage generates abundant variation in gene expression. We describe “runaway duplication haplotypes” in which genes, including HPR and ORM1, have mutated to high copy number on specific haplotypes. We describe partially successful initial strategies for analyzing mCNVs via imputation and provide an initial data resource to support such analyses.