Publication:

Peering into the dark matter of the \mbox{\emph{M. tuberculosis}} genome using long-read sequencing

Loading...
Thumbnail Image

Date

2023-06-01

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Marin, Maximillian Gabriel. 2023. Peering into the dark matter of the \mbox{\emph{M. tuberculosis}} genome using long-read sequencing. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Tuberculosis (TB) is an infectious disease responsible for over 1 million deaths per year. TB primarily manifests as a pulmonary infection, but can also disseminate throughout the body. The causative agent of TB is the bacteria of the \textit{Mycobacterium tuberculosis} complex (MTBC). Understanding the evolution of the MTBC is critical for developing more effective TB vaccines, combating antibiotic resistance, and understanding the factors that have enabled it to evolve into such a successful pathogen.

The use of short-read whole-genome sequencing (SR-WGS) has greatly advanced our understanding of the genetic diversity of the MTBC. However, due to limitations of short-read sequencing, a significant portion (\textasciitilde10%) of the MTBC genome has been systematically excluded from analysis. In this work we use a new type of technology, long-read sequencing, to confidently study the remaining \textasciitilde10% of the genome that has been systematically excluded from previous studies of genetic diversity. In this work, we use long-read sequencing to generate 158 high-quality complete genome assemblies of the major lineages of human-adapted MTBC. This allows us to uncover new aspects of MTBC evolution, as well as to benchmark common analysis approaches in microbial genomics.

In \textbf{chapter 1}, we utilize 36 complete assemblies to systematically evaluate the accuracy of short-read whole-genome sequencing for variant calling of MTBC isolates. These benchmarking results have broad implications for the use of SR-WGS in the study of MTBC biology, inference of transmission in public health surveillance systems, and WGS applications in other organisms. In \textbf{chapter 2}, we leverage 158 complete genome assemblies to evaluate genome conservation and structural variation. Additionally, we benchmark several common pan-genome analysis pipelines and find that they are prone to overinflate predicted accessory genome size. In \textbf{chapter 3}, we present evidence that gene conversion is a key driver of genetic diversity in a set of hotspots within the MTBC genome. A majority of gene conversion events affect substrates of the ESX secretion systems (PE, PPE, and Esx proteins), a secretion system implicated in virulence. These findings suggest there is an understudied evolutionary force acting on the MTBC genome.

Description

Other Available Sources

Research Data

Keywords

Computational Biology, Genomics, Infectious disease, Molecular Evolution, Mycobacterium tuberculosis, Bioinformatics, Genetics, Evolution & development

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories