Gapped Alignment of Protein Sequence Motifs through Monte Carlo Optimization of a Hidden Markov Model

DSpace/Manakin Repository

Gapped Alignment of Protein Sequence Motifs through Monte Carlo Optimization of a Hidden Markov Model

Show simple item record

dc.contributor.author Neuwald, Andrew F
dc.contributor.author Liu, Jun
dc.date.accessioned 2010-10-06T19:26:39Z
dc.date.issued 2004
dc.identifier.citation Neuwald, Andrew F., and Jun S. Liu. 2004. Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model. BMC Bioinformatics 5:157. en_US
dc.identifier.issn 1471-2105 en_US
dc.identifier.uri http://nrs.harvard.edu/urn-3:HUL.InstRepos:4460793
dc.description.abstract Background: Certain protein families are highly conserved across distantly related organisms and belong to large and functionally diverse superfamilies. The patterns of conservation present in these protein sequences presumably are due to selective constraints maintaining important but unknown structural mechanisms with some constraints specific to each family and others shared by a larger subset or by the entire superfamily. To exploit these patterns as a source of functional information, we recently devised a statistically based approach called contrast hierarchical alignment and interaction network (CHAIN) analysis, which infers the strengths of various categories of selective constraints from co-conserved patterns in a multiple alignment. The power of this approach strongly depends on the quality of the multiple alignments, which thus motivated development of theoretical concepts and strategies to improve alignment of conserved motifs within large sets of distantly related sequences. Results: Here we describe a hidden Markov model (HMM), an algebraic system, and Markov chain Monte Carlo (MCMC) sampling strategies for alignment of multiple sequence motifs. The MCMC sampling strategies are useful both for alignment optimization and for adjusting position specific background amino acid frequencies for alignment uncertainties. Associated statistical formulations provide an objective measure of alignment quality as well as automatic gap penalty optimization. Improved alignments obtained in this way are compared with PSI-BLAST based alignments within the context of CHAIN analysis of three protein families: Giα subunits, prolyl oligopeptidases, and transitional endoplasmic reticulum (p97) AAA+ ATPases. Conclusion: While not entirely replacing PSI-BLAST based alignments, which likewise may be optimized for CHAIN analysis using this approach, these motif-based methods often more accurately align very distantly related sequences and thus can provide a better measure of selective constraints. In some instances, these new approaches also provide a better understanding of family-specific constraints, as we illustrate for p97 ATPases. Programs implementing these procedures and supplementary information are available from the authors. en_US
dc.description.sponsorship Statistics en_US
dc.language.iso en_US en_US
dc.publisher BioMed Central en_US
dc.relation.isversionof doi:10.1186/1471-2105-5-157 en_US
dc.relation.hasversion http://www.ncbi.nlm.nih.gov/pmc/articles/PMC538276/pdf/ en_US
dash.license LAA
dc.title Gapped Alignment of Protein Sequence Motifs through Monte Carlo Optimization of a Hidden Markov Model en_US
dc.type Journal Article en_US
dc.description.version Version of Record en_US
dc.relation.journal BMC Bioinformatics en_US
dash.depositing.author Liu, Jun
dc.date.available 2010-10-06T19:26:39Z

Files in this item

Files Size Format View
538276.pdf 3.372Mb PDF View/Open

This item appears in the following Collection(s)

  • FAS Scholarly Articles [6463]
    Peer reviewed scholarly articles from the Faculty of Arts and Sciences of Harvard University

Show simple item record

 
 

Search DASH


Advanced Search
 
 

Submitters