Publication: Identifying and Quantifying Novel Bacteria
No Thumbnail Available
Date
2021-06-23
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Nchinda-Pungong, Nkaziewoh Ndabong. 2021. Identifying and Quantifying Novel Bacteria. Bachelor's thesis, Harvard College.
Research Data
Abstract
The tree of life lies at the heart of biology, but major gaps persist among bacteria. Attempts to identify these missing microbes face challenges in determining which organisms are poorly characterized and where to find them. Here, we have devised a bioinformatics-based pipeline for identifying novel organisms and assessing their relative abundance in different environments based on 16S sequences. Using data from GTDB, we validate that the 16S V4 region can be used to estimate the novelty of an organism’s whole genome. Then, we apply the pipeline to 16S SILVA data, estimating how many organisms remain to be discovered at each taxonomic level. We also determine that V4 sequencing is likely to underestimate genome novelty relative to the full 16S. Next, we apply the pipeline to datasets from the Earth Microbiome Project, assessing the relative abundance of novel organisms in different environments. Our results indicate that soil samples contain the highest volume of novel bacteria, but the optimal environment for microbial discovery varies based on the desired taxonomic level of novel organisms and laboratory sequencing capacity. We then apply the pipeline to standardized samples collected from several environments, determining that salt marsh soil contains a high density of novel organisms. Lastly, we use the pipeline to enrich one marsh sample for novel organisms, assembling a novel Gracilibacteria genome in the process. This pipeline allows researchers to compare environments for microbial sequencing and enrich for novel organisms, speeding up the rate at which we discover novel bacteria.
Description
Other Available Sources
Keywords
16S, Bacteria, Bioinformatics, Metagenomics, Taxonomy, Bioengineering, Bioinformatics, Microbiology
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service