Publication:
Survival Analysis with High-Dimensional c\Covariates, with Applications to Cancer Genomics

Thumbnail Image

Date

2012-08-09

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Zhao, Sihai. 2012. Survival Analysis with High-Dimensional c\Covariates, with Applications to Cancer Genomics. Doctoral dissertation, Harvard University.

Research Data

Abstract

Recent technological advances have given cancer researchers the ability to gather vast amounts of genetic and genomic data from individual patients. These offer tantalizing possibilities for, for example, basic cancer biology, tailored therapies, and personalized risk predictions. At the same time, they have also introduced many analytical difficulties that cannot be properly addressed with current statistical procedures, because the number of genomic covariates in these datasets is often larger than the sample size. In this dissertation we study methods for addressing this so-called high-dimensional issue when genomic data are used to analyze time-to-event outcomes, so common to clinical cancer studies. In Chapter 1, we propose a regularization method for sparse estimation for estimating equations. Our method can be used even when the number of covariates exceeds the number of samples, and can be implemented using well-studied algorithms from the non-linear constrained optimization literature. Furthermore, for certain estimating equations and certain regularizers, including the lasso and group lasso, we prove a finite-sample probability bound on the accuracy of our estimator. However, it is well-known that these types of regularization methods can achieve better performance if a quick and simple procedure is first used to reduce the number of covariates. In Chapter 2, we propose and theoretically justify a principled method for reducing dimensionality in the analysis of censored data by selecting only the important covariates. Our procedure involves a tuning parameter that has a simple interpretation as the desired false positive rate of this selection. Similar types of model-based screening methods have also been proposed, but only for a few specific models. Model-free screening methods have also recently been studied, but can have lower power to detect important covariates. In Chapter 3 we propose a screening procedure that can be used with any model that can be fit using estimating equations, and provide unified results on its finite-sample screening performance. We thus generalize many recently proposed model-based and model-free screening procedures. We also propose an iterative version of our method and show that it is closely related to a recently studied boosting method for estimating equations.

Description

Other Available Sources

Keywords

biostatistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories