Outcome-Driven Clustering of Microarray Data

Hsu, Jessie

dc.contributor.advisor	Finkelstein, Dianne Madelyn
dc.contributor.advisor	Schoenfeld, David Alan
dc.contributor.author	Hsu, Jessie
dc.date.accessioned	2012-09-17T18:16:37Z
dc.date.issued	2012-09-17
dc.date.submitted	2012
dc.identifier.citation	Hsu, Jessie. 2012. Outcome-Driven Clustering of Microarray Data. Doctoral dissertation, Harvard University.	en_US
dc.identifier.other	http://dissertations.umi.com/gsas.harvard:10410	en
dc.identifier.uri	http://nrs.harvard.edu/urn-3:HUL.InstRepos:9561188
dc.description.abstract	The rapid technological development of high-throughput genomics has given rise to complex high-dimensional microarray datasets. One strategy for reducing the dimensionality of microarray experiments is to carry out a cluster analysis to ﬁnd groups of genes with similar expression patterns. Though cluster analysis has been studied extensively, the clinical context in which the analysis is performed is usually considered separately if at all. However, allowing clinical outcomes to inform the clustering of microarray data has the potential to identify gene clusters that are more useful for describing the clinical course of disease. The aim of this dissertation is to utilize outcome information to drive the clustering of gene expression data. In Chapter 1, we propose a joint clustering model that assumes a relationship between gene clusters and a continuous patient outcome. Gene expression is modeled using cluster speciﬁc random effects such that genes in the same cluster are correlated. A linear combination of these random effects is then used to describe the continuous clinical outcome. We implement a Markov chain Monte Carlo algorithm to iteratively sample the unknown parameters and determine the cluster pattern. Chapter 2 extends this model to binary and failure time outcomes. Our strategy is to augment the data with a latent continuous representation of the outcome and specify that the risk of the event depends on the latent variable. Once the latent variable is sampled, we relate it to gene expression via cluster speciﬁc random effects and apply the methods developed in Chapter 1. The setting of clustering longitudinal microarrays using binary and survival outcomes is considered in Chapter 3. We propose a model that incorporates a random intercept and slope to describe the gene expression time trajectory. As before, a continuous latent variable that is linearly related to the random effects is introduced into the model and a Markov chain Monte Carlo algorithm is used for sampling. These methods are applied to microarray data from trauma patients in the Inﬂammation and Host Response to Injury research project. The resulting partitions are visualized using heat maps that depict the frequency with which genes cluster together.	en_US
dc.language.iso	en_US	en_US
dash.license	LAA
dc.subject	Bayesian	en_US
dc.subject	clustering	en_US
dc.subject	data augmentation	en_US
dc.subject	gene expression	en_US
dc.subject	microarray	en_US
dc.subject	biostatistics	en_US
dc.title	Outcome-Driven Clustering of Microarray Data	en_US
dc.type	Thesis or Dissertation	en_US
dc.date.available	2012-09-17T18:16:37Z
thesis.degree.date	2012	en_US
thesis.degree.discipline	Biostatistics	en_US
thesis.degree.grantor	Harvard University	en_US
thesis.degree.level	doctoral	en_US
thesis.degree.name	Ph.D.	en_US
dc.contributor.committeeMember	Betensky, Rebecca	en_US

Files in this item

Name:: Hsu_gsas.harvard_0084L_10410.pdf
Size:: 2.724Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

FAS Theses and Dissertations [6136]

Show simple item record