A Data-Driven Clustering Method for Time Course Gene Expression Data

DSpace/Manakin Repository

A Data-Driven Clustering Method for Time Course Gene Expression Data

Show simple item record

dc.contributor.author Ma, Ping
dc.contributor.author Castillo-Davis, Cristian I.
dc.contributor.author Zhong, Wenxuan
dc.contributor.author Liu, Jun
dc.date.accessioned 2010-10-05T14:55:18Z
dc.date.issued 2006
dc.identifier.citation Ma, Ping, Cristian I. Castillo-Davis, Wenxuan Zhong, and Jun S. Liu. 2006. A data-driven clustering method for time course gene expression data. Nucleic Acids Research 34(4): 1261-1269. en_US
dc.identifier.issn 0305-1048 en_US
dc.identifier.uri http://nrs.harvard.edu/urn-3:HUL.InstRepos:4457609
dc.description.abstract Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a ‘mean curve’ construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSCLUST, is freely available (http://genemerge.bioteam.net/SSClust.html). en_US
dc.description.sponsorship Statistics en_US
dc.language.iso en_US en_US
dc.publisher Oxford University Press en_US
dc.relation.isversionof doi:10.1093/nar/gkl013 en_US
dc.relation.hasversion http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1388097/pdf/ en_US
dash.license LAA
dc.title A Data-Driven Clustering Method for Time Course Gene Expression Data en_US
dc.type Journal Article en_US
dc.description.version Version of Record en_US
dc.relation.journal Nucleic Acids Research en_US
dash.depositing.author Liu, Jun
dc.date.available 2010-10-05T14:55:18Z

Files in this item

Files Size Format View
1388097.pdf 286.5Kb PDF View/Open

This item appears in the following Collection(s)

  • FAS Scholarly Articles [7374]
    Peer reviewed scholarly articles from the Faculty of Arts and Sciences of Harvard University

Show simple item record

 
 

Search DASH


Advanced Search
 
 

Submitters