Learning “graph-mer” Motifs that Predict Gene Expression Trajectories in Development

DSpace/Manakin Repository

Learning “graph-mer” Motifs that Predict Gene Expression Trajectories in Development

Citable link to this page


Title: Learning “graph-mer” Motifs that Predict Gene Expression Trajectories in Development
Author: Panea, Casandra; Wiggins, Chris H.; Reinke, Valerie; Leslie, Christina; Li, Xuejing

Note: Order does not necessarily reflect citation order of authors.

Citation: Li, Xuejing, Casandra Panea, Chris H. Wiggins, Valerie Reinke, and Christina Leslie. 2010. Learning "graph-mer" motifs that predict gene expression trajectories in development. PLoS Computational Biology 6(4): e1000761.
Full Text & Related Files:
Abstract: A key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profiles and then searching for overrepresented motifs in the promoter sequences of genes in a cluster. However, genes with similar expression profiles may be controlled by distinct regulatory programs. Moreover, if many gene expression profiles in a data set are highly correlated, as in the case of whole organism developmental time series, it may be difficult to resolve fine-grained clusters in the first place. We present a predictive framework for modeling the natural flow of information, from promoter sequence to expression, to learn cis regulatory motifs and characterize gene expression patterns in developmental time courses. We introduce a cluster-free algorithm based on a graph-regularized version of partial least squares (PLS) regression to learn sequence patterns—represented by graphs of k-mers, or “graph-mers”—that predict gene expression trajectories. Applying the approach to wildtype germline development in Caenorhabditis elegans, we found that the first and second latent PLS factors mapped to expression profiles for oocyte and sperm genes, respectively. We extracted both known and novel motifs from the graph-mers associated to these germline-specific patterns, including novel CG-rich motifs specific to oocyte genes. We found evidence supporting the functional relevance of these putative regulatory elements through analysis of positional bias, motif conservation and in situ gene expression. This study demonstrates that our regression model can learn biologically meaningful latent structure and identify potentially functional motifs from subtle developmental time course expression data.
Published Version: doi:10.1371/journal.pcbi.1000761
Other Sources: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2861633/pdf/
Terms of Use: This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
Citable link to this page: http://nrs.harvard.edu/urn-3:HUL.InstRepos:4817612
Downloads of this work:

Show full Dublin Core record

This item appears in the following Collection(s)


Search DASH

Advanced Search