Publication:
Pangenome Alignment: An Improved Method to Accurately Map Telomeric Long-Reads and Its Application in the Analysis of Alternative Lengthening of Telomeres (ALT) Positive Cells

No Thumbnail Available

Date

2023-06-30

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Garrity-Janger, Max. 2023. Pangenome Alignment: An Improved Method to Accurately Map Telomeric Long-Reads and Its Application in the Analysis of Alternative Lengthening of Telomeres (ALT) Positive Cells. Bachelor's thesis, Harvard University Engineering and Applied Sciences.

Research Data

Abstract

The ability to accurately align telomeric DNA sequencing reads is crucial to understanding a variety of biologically significant pathways notably including cancer proliferation and tumorigenesis. This thesis aims to assess standard haploid alignment methods' ability to accurately align telomeric reads generated using long-read WGS. We then propose an alternative pangenome alignment approach to telomeric read mapping and compare its mapping accuracy to that of haploid alignment. Finally, we aim to apply this novel pangenome alignment approach to analyze telomere length heterogeneity in cancer cells that depend on the alternative telomere lengthening (ALT) pathway. To assess mapping error rates of conventional haploid and novel pangenome alignment approaches, PBSIM2, a long-read WGS simulator, was used to simulate sequencing data from a variety of references. Telomeric reads were then extracted from this simulated data and mapped to either a haploid, diploid, or pangenome reference genome. We then compared the overall and chromosomal arm specific mismapping rates for each alignment approach. We then applied both haploid and pangenome alignment to long-read WGS from an ALT+ U20S osteosarcoma cell line and estimated the arm specific telomeric read length distribution. Lastly, we utilized genetic engineering tools using CRISPR/Cas9 to sequentially inactivate ATRX and telomerase to build an isogenic model of telomerase-dependent and ALT-dependent cells that can be utilized to study chromosome-specific telomere changes that occur during the development of ALT. The results found that haploid alignment results in high overall mismapping rates between 10 and 40 percent when mapping telomeric sequencing reads, even those which extend well into the sub-telomeric regions. Furthermore, we found that certain chromosomes with higher levels of paralogy displayed even higher mismapping rates of well over 50 percent. In contrast, our proposed pangenome alignment approach consistently performed at overall mismapping rates of less than 10 percent representing a 50 percent or more reduction in error as compared to haploid alignment. When chromosome specific mismapping error was evaluated, pangenome alignment also resulted in arm specific reductions in error rate. We then analyzed long-read WGS data from ALT+ osteosarcoma cells (U2OS) using the pangenome alignment and found that these cells harbor chromosomal arm specific telomere length heterogeneity. Lastly, we successfully inactivated ATRX in telomerase-positive and TERT promoter-mutant SF295 glioblastoma cells. We found that this leads to some features of ALT, however, the ATRX-knockout cells remained dependent on the telomerase protein component TERT. These results emphasize the need for improved mapping methods for highly repetitive telomeric sequences which display significant interchromosomal paralogy and genetic diversity across samples. The ability to accurately map telomeric reads is crucial to our understanding of pathways such as ALT and structural rearrangements in cancers which have major implications on our understanding of the biology and treatment of human health. KEYWORDS: Long-read WGS, Telomeres, Pangenome, Telomerase, ATRX, TERC, TERT, Alternative Lengthening of Telomeres (ALT), Cancer

Description

Other Available Sources

Keywords

Biology

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories