Person: Eddy, Sean
Loading...
Email Address
AA Acceptance Date
Birth Date
Research Projects
Organizational Units
Job Title
Last Name
Eddy
First Name
Sean
Name
Eddy, Sean
5 results
Search Results
Now showing 1 - 5 of 5
Publication Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families(Oxford University Press, 2018) Kalvari, Ioanna; Argasinska, Joanna; Quinones-Olvera, Natalia; Nawrocki, Eric P; Rivas, Elena; Eddy, Sean; Bateman, Alex; Finn, Robert D; Petrov, Anton IAbstract The Rfam database is a collection of RNA families in which each family is represented by a multiple sequence alignment, a consensus secondary structure, and a covariance model. In this paper we introduce Rfam release 13.0, which switches to a new genome-centric approach that annotates a non-redundant set of reference genomes with RNA families. We describe new web interface features including faceted text search and R-scape secondary structure visualizations. We discuss a new literature curation workflow and a pipeline for building families based on RNAcentral. There are 236 new families in release 13.0, bringing the total number of families to 2687. The Rfam website is http://rfam.org.Publication The Dfam database of repetitive DNA families(Oxford University Press, 2016) Hubley, Robert; Finn, Robert D.; Clements, Jody; Eddy, Sean; Jones, Thomas; Bao, Weidong; Smit, Arian F.A.; Wheeler, Travis J.Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of known repeat families from four new organisms (mouse, zebrafish, fly and nematode). We describe improvements to coverage, and to our methods for identifying and reducing false annotation. We also describe updates to the website interface. The Dfam website has moved to http://dfam.org. Seed alignments, profile HMMs, hit lists and other underlying data are available for download.Publication The Pfam protein families database: towards a more sustainable future(Oxford University Press, 2016) Finn, Robert D.; Coggill, Penelope; Eberhardt, Ruth Y.; Eddy, Sean; Mistry, Jaina; Mitchell, Alex L.; Potter, Simon C.; Punta, Marco; Qureshi, Matloob; Sangrador-Vegas, Amaia; Salazar, Gustavo A.; Tate, John; Bateman, AlexIn the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.Publication Parameterizing sequence alignment with an explicit evolutionary model(BioMed Central, 2015) Rivas, Elena; Eddy, SeanBackground: Inference of sequence homology is inherently an evolutionary question, dependent upon evolutionary divergence. However, the insertion and deletion penalties in the most widely used methods for inferring homology by sequence alignment, including BLAST and profile hidden Markov models (profile HMMs), are not based on any explicitly time-dependent evolutionary model. Using one fixed score system (BLOSUM62 with some gap open/extend costs, for example) corresponds to making an unrealistic assumption that all sequence relationships have diverged by the same time. Adoption of explicit time-dependent evolutionary models for scoring insertions and deletions in sequence alignments has been hindered by algorithmic complexity and technical difficulty. Results: We identify and implement several probabilistic evolutionary models compatible with the affine-cost insertion/deletion model used in standard pairwise sequence alignment. Assuming an affine gap cost imposes important restrictions on the realism of the evolutionary models compatible with it, as single insertion events with geometrically distributed lengths do not result in geometrically distributed insert lengths at finite times. Nevertheless, we identify one evolutionary model compatible with symmetric pair HMMs that are the basis for Smith-Waterman pairwise alignment, and two evolutionary models compatible with standard profile-based alignment. We test different aspects of the performance of these “optimized branch length” models, including alignment accuracy and homology coverage (discrimination of residues in a homologous region from nonhomologous flanking residues). We test on benchmarks of both global homologies (full length sequence homologs) and local homologies (homologous subsequences embedded in nonhomologous sequence). Conclusions: Contrary to our expectations, we find that for global homologies a single long branch parameterization suffices both for distant and close homologous relationships. In contrast, we do see an advantage in using explicit evolutionary models for local homologies. Optimal branch parameterization reduces a known artifact called “homologous overextension”, in which local alignments erroneously extend through flanking nonhomologous residues. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0832-5) contains supplementary material, which is available to authorized users.Publication Combinatorial DNA Rearrangement Facilitates the Origin of New Genes in Ciliates(Oxford University Press, 2015) Chen, Xiao; Jung, Seolkyoung; Beh, Leslie Y.; Eddy, Sean; Landweber, Laura F.Programmed genome rearrangements in the unicellular eukaryote Oxytricha trifallax produce a transcriptionally active somatic nucleus from a copy of its germline nucleus during development. This process eliminates noncoding sequences that interrupt coding regions in the germline genome, and joins over 225,000 remaining DNA segments, some of which require inversion or complex permutation to build functional genes. This dynamic genomic organization permits some single DNA segments in the germline to contribute to multiple, distinct somatic genes via alternative processing. Like alternative mRNA splicing, the combinatorial assembly of DNA segments contributes to genetic variation and facilitates the evolution of new genes. In this study, we use comparative genomic analysis to demonstrate that the emergence of alternative DNA splicing is associated with the origin of new genes. Short duplications give rise to alternative gene segments that are spliced to the shared gene segments. Alternative gene segments evolve faster than shared, constitutive segments. Genes with shared segments frequently have different expression profiles, permitting functional divergence. This study reports alternative DNA splicing as a mechanism of new gene origination, illustrating how the process of programmed genome rearrangement gives rise to evolutionary innovation.