Publication:

Genome-Wide Detection of Structural Variants and Signatures of Their Selection in Cancer

Loading...
Thumbnail Image

Date

2017-05-10

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Abstract

Structural variants (SV) are a heterogeneous class of genomic variation that can have profound effects on the structure and function of the cancer genome. SVs are challenging to detect in short-read sequencing data through standard alignment methods. Sequence assembly offers a powerful detection approach, but is difficult to apply genome-wide due to its computational complexity and the difficulty of extracting SVs from assemblies. I describe SvABA, an efficient and accurate method for detecting SVs using genome-wide local assembly. Evaluated on the NA12878 human genome and in simulated and real cancer genomes, SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs, and substantially improves detection performance over existing methods for variants in the 20-300 bp range. SvABA also identifies complex somatic rearrangements with chains of short (< 1,000 bp) templated-sequence insertions copied from distant genomic regions. I further describe the application of SvABA to several cancer sequencing projects to reveal both indels and rearrangements that drive cancer. I next analyze rearrangements in 2,693 cancer whole-genomes from the International Cancer Genome Consortium (ICGC). To understand the mechanistic and selective pressures shaping these variants, I describe a two-part paradigm for analyzing rearrangement breakpoints and the fusions connecting disparate loci. I find that breakpoint rates exhibit substantial heterogeneity across the genome and among tumor types, and are enriched in open-chromatin and sites with high densities of repetitive elements. After accounting for these mechanistic factors, I discovered enrichment of breakpoints within 0.3% of the genome, including novel focal microdeletions at BRD4 in breast and ovarian cancers. For fusions, the major determinant of whether two loci will be fused is the genomic distance between them. Accounting for this distribution, I identify significantly recurrent fusion events, including a novel recurrent t(2;7) translocation between THADA and IGF2BP3 in thyroid cancer. I further find that chromatin structure and the relative homology between breakpoints in the context of repetitive elements significantly influence the distribution of somatic fusions. Finally, I describe a suite of open-access C++ tools, including VariantBam for extracting variant-containing sequencing reads from sequencing files, and the SeqLib sequence alignment and sequence assembly toolkit.

Description

Other Available Sources

Research Data

Keywords

Biology, Genetics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories