Show simple item record

dc.contributor.advisorLiu, Jun
dc.contributor.authorJiang, Bo
dc.date.accessioned2013-09-27T21:00:36Z
dc.date.issued2013-09-27
dc.date.submitted2013
dc.identifier.citationJiang, Bo. 2013. Partition Models for Variable Selection and Interaction Detection. Doctoral dissertation, Harvard University.en_US
dc.identifier.otherhttp://dissertations.umi.com/gsas.harvard:10911en
dc.identifier.urihttp://nrs.harvard.edu/urn-3:HUL.InstRepos:11124828
dc.description.abstractVariable selection methods play important roles in modeling high-dimensional data and are key to data-driven scientific discoveries. In this thesis, we consider the problem of variable selection with interaction detection. Instead of building a predictive model of the response given combinations of predictors, we start by modeling the conditional distribution of predictors given partitions based on responses. We use this inverse modeling perspective as motivation to propose a stepwise procedure for effectively detecting interaction with few assumptions on parametric form. The proposed procedure is able to detect pairwise interactions among p predictors with a computational time of \(O(p)\) instead of \(O(p^2)\) under moderate conditions. We establish consistency of the proposed procedure in variable selection under a diverging number of predictors and sample size. We demonstrate its excellent empirical performance in comparison with some existing methods through simulation studies as well as real data examples. Next, we combine the forward and inverse modeling perspectives under the Bayesian framework to detect pleiotropic and epistatic effects in effects in expression quantitative loci (eQTLs) studies. We augment the Bayesian partition model proposed by Zhang et al. (2010) to capture complex dependence structure among gene expression and genetic markers. In particular, we propose a sequential partition prior to model the asymmetric roles played by the response and the predictors, and we develop an efficient dynamic programming algorithm for sampling latent individual partitions. The augmented partition model significantly improves the power in detecting eQTLs compared to previous methods in both simulations and real data examples pertaining to yeast. Finally, we study the application of Bayesian partition models in the unsupervised learning of transcription factor (TF) families based on protein binding microarray (PBM). The problem of TF subclass identification can be viewed as the clustering of TFs with variable selection on their binding DNA sequences. Our model provides simultaneous identification of TF families and their shared sequence preferences, as well as DNA sequences bound preferentially by individual members of TF families. Our analysis may aid in deciphering cis regulatory codes and determinants of protein-DNA binding specificity.en_US
dc.description.sponsorshipStatisticsen_US
dc.language.isoen_USen_US
dash.licenseLAA
dc.subjectStatisticsen_US
dc.subjectHierarchical modelsen_US
dc.subjectInverse modelsen_US
dc.subjectQuantitative trait locien_US
dc.subjectSliced inverse regressionen_US
dc.subjectSure independence screeningen_US
dc.subjectTranscriptional regulationen_US
dc.titlePartition Models for Variable Selection and Interaction Detectionen_US
dc.typeThesis or Dissertationen_US
dash.depositing.authorJiang, Bo
dc.date.available2013-09-27T21:00:36Z
thesis.degree.date2013en_US
thesis.degree.disciplineStatisticsen_US
thesis.degree.grantorHarvard Universityen_US
thesis.degree.leveldoctoralen_US
thesis.degree.namePh.D.en_US
dc.contributor.committeeMemberLiu, Junen_US
dc.contributor.committeeMemberBulyk, Marthaen_US
dc.contributor.committeeMemberBlitzstein, Josephen_US
dash.contributor.affiliatedJiang, Bo


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record