Three Aspects of Biostatistical Learning Theory

Neykov, Matey

dc.contributor.advisor	Cai, Tianxi	en_US
dc.contributor.advisor	Liu, Jun S.	en_US
dc.contributor.author	Neykov, Matey	en_US
dc.date.accessioned	2015-07-17T17:59:19Z
dc.date.created	2015-05	en_US
dc.date.issued	2015-05-15	en_US
dc.date.submitted	2015	en_US
dc.identifier.citation	Neykov, Matey. 2015. Three Aspects of Biostatistical Learning Theory. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.	en_US
dc.identifier.uri	http://nrs.harvard.edu/urn-3:HUL.InstRepos:17467395
dc.description.abstract	In the present dissertation we consider three classical problems in biostatistics and statistical learning - classification, variable selection and statistical inference. Chapter 2 is dedicated to multi-class classification. We characterize a class of loss functions which we deem relaxed Fisher consistent, whose local minimizers not only recover the Bayes rule but also the exact conditional class probabilities. Our class encompasses previously studied classes of loss-functions, and includes non-convex functions, which are known to be less susceptible to outliers. We propose a generic greedy functional gradient-descent minimization algorithm for boosting weak learners, which works with any loss function in our class. We show that the boosting algorithm achieves geometric rate of convergence in the case of a convex loss. In addition we provide numerical studies and a real data example which serve to illustrate that the algorithm performs well in practice. In Chapter 3, we provide insights on the behavior of sliced inverse regression in a high-dimensional setting under a single index model. We analyze two algorithms: a thresholding based algorithm known as diagonal thresholding and an L1 penalization algorithm - semidefinite programming, and show that they achieve optimal (up to a constant) sample size in terms of support recovery in the case of standard Gaussian predictors. In addition, we look into the performance of the linear regression LASSO in single index models with correlated Gaussian designs. We show that under certain restrictions on the covariance and signal, the linear regression LASSO can also enjoy optimal sample size in terms of support recovery. Our analysis extends existing results on LASSO's variable selection capabilities for linear models. Chapter 4 develops general inferential framework for testing and constructing confidence intervals for high-dimensional estimating equations. Such framework has a variety of applications and allows us to provide tests and confidence regions for parameters estimated by algorithms such as the Dantzig Selector, CLIME and LDP among others, non of which has been previously equipped with inferential procedures.	en_US
dc.description.sponsorship	Biostatistics	en_US
dc.format.mimetype	application/pdf	en_US
dc.language.iso	en	en_US
dash.license	LAA	en_US
dc.subject	Statistics	en_US
dc.title	Three Aspects of Biostatistical Learning Theory	en_US
dc.type	Thesis or Dissertation	en_US
dash.depositing.author	Neykov, Matey	en_US
dc.date.available	2015-07-17T17:59:19Z
thesis.degree.date	2015	en_US
thesis.degree.grantor	Graduate School of Arts & Sciences	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy	en_US
dc.contributor.committeeMember	Lin, Xihong	en_US
dc.type.material	text	en_US
thesis.degree.department	Biostatistics	en_US
dash.identifier.vireo	http://etds.lib.harvard.edu/gsas/admin/view/375	en_US
dc.description.keywords	Multi-class classification, Fisher consistency, Variable selection, Sliced inverse regression, Single-index models, LASSO, High-dimensional inference, Regularized Estimating Equations, Dantzig Selector, CLIME	en_US
dash.author.email	mneykov@gmail.com	en_US
dash.identifier.drs	urn-3:HUL.DRS.OBJECT:25164913	en_US
dash.identifier.orcid	0000-0002-3320-3889	en_US
dash.contributor.affiliated	Neykov, Matey
dc.identifier.orcid	0000-0002-3320-3889

Files in this item

Name:: NEYKOV-DISSERTATION-2015.pdf
Size:: 2.518Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

FAS Theses and Dissertations [6136]

Show simple item record