Model-based Clustering of DNA Methylation Array Data: A Recursive-Partitioning Algorithm for High-dimensional Data Arising as a Mixture of Beta Distributions

DSpace/Manakin Repository

Model-based Clustering of DNA Methylation Array Data: A Recursive-Partitioning Algorithm for High-dimensional Data Arising as a Mixture of Beta Distributions

Show simple item record

dc.contributor.author Christensen, Brock C
dc.contributor.author Yeh, Ru-Fang
dc.contributor.author Marsit, Carmen J
dc.contributor.author Karagas, Margaret R
dc.contributor.author Wrensch, Margaret
dc.contributor.author Nelson, Heather H
dc.contributor.author Wiemels, Joseph
dc.contributor.author Zheng, Shichun
dc.contributor.author Wiencke, John K
dc.contributor.author Kelsey, Karl T
dc.contributor.author Houseman, Eugene Andres
dc.date.accessioned 2010-11-29T18:41:35Z
dc.date.issued 2008
dc.identifier.citation Houseman, E. Andres, Brock C. Christensen, Ru-Fang Yeh, Carmen J. Marsit, Margaret R. Karagas, Margaret Wrensch, Heather H. Nelson, et al. 2008. Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions. BMC Bioinformatics 9:365. en_US
dc.identifier.issn 1471-2105 en_US
dc.identifier.uri http://nrs.harvard.edu/urn-3:HUL.InstRepos:4592392
dc.description.abstract Background: Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes. Model-based cluster analysis is often used to identify DNA methylation subgroups in data, but it is unclear how to cluster DNA methylation data from arrays in a scalable and reliable manner. Results: We propose a novel model-based recursive-partitioning algorithm to navigate clusters in a beta mixture model. We present simulations that show that the method is more reliable than competing nonparametric clustering approaches, and is at least as reliable as conventional mixture model methods. We also show that our proposed method is more computationally efficient than conventional mixture model approaches. We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age. Conclusion: Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data. en_US
dc.language.iso en_US en_US
dc.publisher BioMed Central en_US
dc.relation.isversionof doi:10.1186/1471-2105-9-365 en_US
dc.relation.hasversion http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553421/pdf/ en_US
dash.license LAA
dc.title Model-based Clustering of DNA Methylation Array Data: A Recursive-Partitioning Algorithm for High-dimensional Data Arising as a Mixture of Beta Distributions en_US
dc.type Journal Article en_US
dc.description.version Version of Record en_US
dc.relation.journal BMC Bioinformatics en_US
dash.depositing.author Houseman, Eugene Andres
dc.date.available 2010-11-29T18:41:35Z
dash.affiliation.other SPH^Biostatistics en_US

Files in this item

Files Size Format View
2553421.pdf 807.3Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record

 
 

Search DASH


Advanced Search
 
 

Submitters