Survival Analysis with High-Dimensional c\Covariates, with Applications to Cancer Genomics
MetadataShow full item record
CitationZhao, Sihai. 2012. Survival Analysis with High-Dimensional c\Covariates, with Applications to Cancer Genomics. Doctoral dissertation, Harvard University.
AbstractRecent technological advances have given cancer researchers the ability to gather vast amounts of genetic and genomic data from individual patients. These offer tantalizing possibilities for, for example, basic cancer biology, tailored therapies, and personalized risk predictions. At the same time, they have also introduced many analytical difficulties that cannot be properly addressed with current statistical procedures, because the number of genomic covariates in these datasets is often larger than the sample size. In this dissertation we study methods for addressing this so-called high-dimensional issue when genomic data are used to analyze time-to-event outcomes, so common to clinical cancer studies. In Chapter 1, we propose a regularization method for sparse estimation for estimating equations. Our method can be used even when the number of covariates exceeds the number of samples, and can be implemented using well-studied algorithms from the non-linear constrained optimization literature. Furthermore, for certain estimating equations and certain regularizers, including the lasso and group lasso, we prove a ﬁnite-sample probability bound on the accuracy of our estimator. However, it is well-known that these types of regularization methods can achieve better performance if a quick and simple procedure is ﬁrst used to reduce the number of covariates. In Chapter 2, we propose and theoretically justify a principled method for reducing dimensionality in the analysis of censored data by selecting only the important covariates. Our procedure involves a tuning parameter that has a simple interpretation as the desired false positive rate of this selection. Similar types of model-based screening methods have also been proposed, but only for a few speciﬁc models. Model-free screening methods have also recently been studied, but can have lower power to detect important covariates. In Chapter 3 we propose a screening procedure that can be used with any model that can be ﬁt using estimating equations, and provide uniﬁed results on its ﬁnite-sample screening performance. We thus generalize many recently proposed model-based and model-free screening procedures. We also propose an iterative version of our method and show that it is closely related to a recently studied boosting method for estimating equations.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:9385643
- FAS Theses and Dissertations