Gene Prediction with Glimmer for Metagenomic Sequences Augmented by Classification and Clustering

DSpace/Manakin Repository

Gene Prediction with Glimmer for Metagenomic Sequences Augmented by Classification and Clustering

Citable link to this page

 

 
Title: Gene Prediction with Glimmer for Metagenomic Sequences Augmented by Classification and Clustering
Author: Kelley, David Roy; Liu, Bo; Delcher, Arthur L.; Pop, Mihai; Salzberg, Steven L.

Note: Order does not necessarily reflect citation order of authors.

Citation: Kelley, David R., Bo Liu, Arthur L. Delcher, Mihai Pop, and Steven L. Salzberg. 2011. Gene prediction with glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Research 40(1): e9.
Full Text & Related Files:
Abstract: Environmental shotgun sequencing (or metagenomics) is widely used to survey the communities of microbial organisms that live in many diverse ecosystems, such as the human body. Finding the protein-coding genes within the sequences is an important step for assessing the functional capacity of a metagenome. In this work, we developed a metagenomics gene prediction system Glimmer-MG that achieves significantly greater accuracy than previous systems via novel approaches to a number of important prediction subtasks. First, we introduce the use of phylogenetic classifications of the sequences to model parameterization. We also cluster the sequences, grouping together those that likely originated from the same organism. Analogous to iterative schemes that are useful for whole genomes, we retrain our models within each cluster on the initial gene predictions before making final predictions. Finally, we model both insertion/deletion and substitution sequencing errors using a different approach than previous software, allowing Glimmer-MG to change coding frame or pass through stop codons by predicting an error. In a comparison among multiple gene finding methods, Glimmer-MG makes the most sensitive and precise predictions on simulated and real metagenomes for all read lengths and error rates tested.
Published Version: doi:10.1093/nar/gkr1067
Other Sources: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245904/pdf/
Terms of Use: This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
Citable link to this page: http://nrs.harvard.edu/urn-3:HUL.InstRepos:11248787
Downloads of this work:

Show full Dublin Core record

This item appears in the following Collection(s)

 
 

Search DASH


Advanced Search
 
 

Submitters