The Tangent Copy-Number Inference Pipeline for Cancer Genome Analyses
Access StatusFull text of the requested work is not available in DASH at this time ("dark deposit"). For more information on dark deposits, see our FAQ.
MetadataShow full item record
CitationOh, Coyin. 2020. The Tangent Copy-Number Inference Pipeline for Cancer Genome Analyses. Doctoral dissertation, Harvard Medical School.
AbstractSomatic copy-number alterations (SCNAs) play an important role in the development of cancer. In cancer genome analyses, accurate profiling of SCNAs is often deterred by the presence of noise in microarray and next-generation sequencing data. Here, we present the Tangent copy-number inference pipeline, which employs a novel normalization approach to reducing systematic noise in copy-number profiles. Tangent normalization first constructs a noise model using the weighted sum of noise profiles from a collection of normal samples that most closely matches the tumor noise profile. The estimated noise profile is then subtracted from the tumor signal to generate copynumber data for the tumor. The performance of Tangent thus hinges on having adequate representation of noise profiles in the normal reference collection. Since there are often practical limitations to obtaining sufficient number of normal samples, here we also describe the Pseudo-Tangent pipeline, which is an adaptation of Tangent to generate signal-subtracted tumor profiles that can be used to augment the reference collection. We applied Tangent to single-nucleotide polymorphism (SNP) array data and whole-exome sequencing data in The Cancer Genome Atlas (TCGA). Tangent normalization offers significant reduction in noise and improvement in signal-to-noise ratio compared to conventional normalization approaches. In the case of limited normal samples, Pseudo-Tangent also provides substantial reduction in systematic noise compared to Tangent alone and other conventional approaches. Tangent and Pseudo-Tangent are broadly applicable to multiple sequencing technologies for more accurate inference of SCNAs in the cancer genome. As part of TCGA, we have published the copy-number results for SNP array data of over 10,000 tumor-normal pairs that were analyzed with the Tangent pipeline. We have also made Tangent and Pseudo-Tangent publicly available for downloads and implementation.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37365217