Show simple item record

dc.contributor.advisorFinkelstein, Dianne Madelyn
dc.contributor.advisorSchoenfeld, David Alan
dc.contributor.authorHsu, Jessie
dc.date.accessioned2012-09-17T18:16:37Z
dc.date.issued2012-09-17
dc.date.submitted2012
dc.identifier.citationHsu, Jessie. 2012. Outcome-Driven Clustering of Microarray Data. Doctoral dissertation, Harvard University.en_US
dc.identifier.otherhttp://dissertations.umi.com/gsas.harvard:10410en
dc.identifier.urihttp://nrs.harvard.edu/urn-3:HUL.InstRepos:9561188
dc.description.abstractThe rapid technological development of high-throughput genomics has given rise to complex high-dimensional microarray datasets. One strategy for reducing the dimensionality of microarray experiments is to carry out a cluster analysis to find groups of genes with similar expression patterns. Though cluster analysis has been studied extensively, the clinical context in which the analysis is performed is usually considered separately if at all. However, allowing clinical outcomes to inform the clustering of microarray data has the potential to identify gene clusters that are more useful for describing the clinical course of disease. The aim of this dissertation is to utilize outcome information to drive the clustering of gene expression data. In Chapter 1, we propose a joint clustering model that assumes a relationship between gene clusters and a continuous patient outcome. Gene expression is modeled using cluster specific random effects such that genes in the same cluster are correlated. A linear combination of these random effects is then used to describe the continuous clinical outcome. We implement a Markov chain Monte Carlo algorithm to iteratively sample the unknown parameters and determine the cluster pattern. Chapter 2 extends this model to binary and failure time outcomes. Our strategy is to augment the data with a latent continuous representation of the outcome and specify that the risk of the event depends on the latent variable. Once the latent variable is sampled, we relate it to gene expression via cluster specific random effects and apply the methods developed in Chapter 1. The setting of clustering longitudinal microarrays using binary and survival outcomes is considered in Chapter 3. We propose a model that incorporates a random intercept and slope to describe the gene expression time trajectory. As before, a continuous latent variable that is linearly related to the random effects is introduced into the model and a Markov chain Monte Carlo algorithm is used for sampling. These methods are applied to microarray data from trauma patients in the Inflammation and Host Response to Injury research project. The resulting partitions are visualized using heat maps that depict the frequency with which genes cluster together.en_US
dc.language.isoen_USen_US
dash.licenseLAA
dc.subjectBayesianen_US
dc.subjectclusteringen_US
dc.subjectdata augmentationen_US
dc.subjectgene expressionen_US
dc.subjectmicroarrayen_US
dc.subjectbiostatisticsen_US
dc.titleOutcome-Driven Clustering of Microarray Dataen_US
dc.typeThesis or Dissertationen_US
dc.date.available2012-09-17T18:16:37Z
thesis.degree.date2012en_US
thesis.degree.disciplineBiostatisticsen_US
thesis.degree.grantorHarvard Universityen_US
thesis.degree.leveldoctoralen_US
thesis.degree.namePh.D.en_US
dc.contributor.committeeMemberBetensky, Rebeccaen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record