Publication: The Tangent Copy-Number Inference Pipeline for Cancer Genome Analyses
No Thumbnail Available
Open/View Files
Date
2020-06-24
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Oh, Coyin. 2020. The Tangent Copy-Number Inference Pipeline for Cancer Genome Analyses. Doctoral dissertation, Harvard Medical School.
Research Data
Abstract
Somatic copy-number alterations (SCNAs) play an important role in the development of cancer. In cancer genome analyses, accurate profiling of SCNAs is often deterred by the presence of noise in microarray and next-generation sequencing data. Here, we present the Tangent copy-number inference pipeline, which employs a novel normalization approach to reducing systematic noise in copy-number profiles. Tangent normalization first constructs a noise model using the weighted sum of noise profiles from a collection of normal samples that most closely matches the tumor noise profile. The estimated noise profile is then subtracted from the tumor signal to generate copynumber data for the tumor. The performance of Tangent thus hinges on having adequate representation of noise profiles in the normal reference collection. Since there are often practical limitations to obtaining sufficient number of normal samples, here we also describe the Pseudo-Tangent pipeline, which is an adaptation of Tangent to generate signal-subtracted tumor profiles that can be used to augment the reference collection. We applied Tangent to single-nucleotide polymorphism (SNP) array data and whole-exome sequencing data in The Cancer Genome Atlas (TCGA). Tangent normalization offers significant reduction in noise and improvement in signal-to-noise ratio compared to conventional normalization approaches. In the case of limited normal samples, Pseudo-Tangent also provides substantial reduction in systematic noise compared to Tangent alone and other conventional approaches. Tangent and Pseudo-Tangent are broadly applicable to multiple sequencing technologies for more accurate inference of SCNAs in the cancer genome. As part of TCGA, we have published the copy-number results for SNP array data of over 10,000 tumor-normal pairs that were analyzed with the Tangent pipeline. We have also made Tangent and Pseudo-Tangent publicly available for downloads and implementation.
Description
Other Available Sources
Keywords
cancer genomics, computational algorithms, bioinformatics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service