Publication:
Development of Methods for Cancer Genome Analysis and Clinical Applications

No Thumbnail Available

Date

2022-06-06

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Lin, Ziao. 2022. Development of Methods for Cancer Genome Analysis and Clinical Applications. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

Cancer, a disease with increasing morbidity and mortality, was responsible for almost 20 million new cases and 10 million deaths worldwide in 2020, becoming one of the most significant public health issues of our time. With the emergence of high-throughput sequencing technologies and computational analytical methods, we have been granted powerful weapons to interrogate the cancer genome, which has advanced the development of new opportunities for more effective and precise diagnostic and therapeutic approaches. As a result, clinical outcomes of cancer patients have significantly improved during the last two decades. The successful application of personalized medicine requires not only accurate characterization of each individual patient's cancer genome, but also robust interpretation of the driver events within that genome. To be able to create the molecular map that drives tumorigenesis, analysis of data from many patients with the same type of cancer in a homogeneous way is a prerequisite. To illustrate this idea, I focus on chronic lymphocytic leukemia (CLL), a B cell neoplasm with variable natural history that is conventionally categorized into two major subtypes distinguished by the extent of somatic mutations in the heavy chain variable region of immunoglobulin genes (IGHV). Collaborating with physicians and scientists around the globe, we created by far the largest dataset consisting of multi-layers molecular and clinical data deriving from 1156 CLL patients. Unlike previous analyses that have provided only fragments of the ‘CLL map’, each focusing on some particular patient populations or only single data type, this complete dataset has granted us sufficient power and resolution to fully characterize the bioclinical spectrum of the disease. Through developing and applying novel computational analysis methods, we identified 202 candidate genetic drivers of CLL (109 novel) and refined the characterization of each IGHV subtype, which revealed distinct genetic landscapes with unique patterns of their leukemogenic trajectories. Discovery of new gene expression subtypes further subcategorized this neoplasm and proved to be an independent prognostic factor. Clinical outcomes were associated with a combination of genetic alterations, epigenetic states, and gene expression clusters, further advancing our prognostic paradigm. Overall, this work reveals fresh insights into CLL oncogenesis and prognostication, and the curated and harmonized dataset will serve as valuable resources for the whole research community in future study. Since detecting cancer at early stages or upon recurrence is critical to decreasing cancer morbidity and mortality, I have also developed TuFEst (Tumor Fraction Estimator), a highly sensitive, accurate and cost-effective computational approach for pan-cancer detection and tumor burden estimation from ultra-low coverage whole genome sequencing (at the scale of around 0.1x, ULP-WGS) of minimally invasive cell-free DNA (cfDNA). Current state-of-the-art methods estimate tumor fraction (TF) from ULP-WGS depending exclusively on total copy number variation, but these methods lose tumor signals in either copy number–quiet tumors or tumors dominated by copy-neutral loss-of-heterozygosity. To overcome this issue, TuFEst can synergistically integrate copy number signals with altered fragment length, which has a demonstrated potential to achieve better detection limit (e.g. TF 0.1\%) together with more accurate estimation of TF across various cancer types. Clinical application of TuFEst suggests its power in detecting early cancer recurrence under different therapies: for example, in one breast cancer patient receiving CDK4/6 inhibitors, results from TuFEst analysis warned of cancer relapse 262 days earlier than routine imaging. Altogether, our work suggests that accurate estimation of TF in cfDNA can not only aid in detecting cancer at early stages but also provide evidence of disease progression during treatment. We believe that such a non-invasive, cost-effective, and pan-cancer detection method will benefit both early cancer screening and also cancer relapse monitoring. Finally, cfDNA has shown potential for being a good proxy in lieu of tumor biopsies for cancer genomics profiling to help guide clinical management, especially when standard biopsies are difficult. To fully understand the tumor composition of cfDNA, we sought to develop methods to characterize the relationship between cfDNA and the body-wide tumor phylogeny, and to compare the ability of identifying truncal mutations between cfDNA and standard biopsies. Truncal mutations are the "founder" mutations that accumulate in cancer cells before the last selective sweep happens. These truncal mutations are thus expected to exist in all the cancer cells (including the un-biopsied sites), unless some cancer subclones undergo copy number deletion during subsequent evolution. Therefore, given the high prevalence of truncal mutations in cancer cells, they can potentially be valuable therapeutic targets. In this analysis, we collected plasma and multiple biopsies from the primary tumor and metastatic sites during the autopsy of three pancreatic cancer patients performed shortly after their death. The results demonstrate that multiple tumor clones exist in cfDNA, indicating that circulating tumor DNA (ctDNA) is a mixture of tumor DNA shed from various tumor clones across the body. This further suggests that cfDNA is a real-time representation of tumor heterogeneity found across the whole body. In addition, ctDNA also faithfully reflects truncal mutations, and it helps overcome the sampling bias derived from a single biopsy site. We believe that these two attractive properties will make cfDNA a good proxy for tumor biopsy–derived DNA in guiding cancer management.

Description

Other Available Sources

Keywords

Cancer, Cell-free DNA, Computation, Genomics, Methods, Bioinformatics, Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories