Publication: Gradient Descent for Optimization Problems With Sparse Solutions
No Thumbnail Available
Date
2016-05-18
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Chen, Hsieh-Chung. 2016. Gradient Descent for Optimization Problems With Sparse Solutions. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
Research Data
Abstract
Sparse modeling is central to many machine learning and signal processing algorithms, because finding a parsimonious model often implicitly removes noise and reveals structure in data. They appear in applications such as feature selection, feature extraction, sparse support vector machines, sparse logistic regression, denoising, and compressive sensing. This raises a great interest in solving optimization problems with sparse solutions.
There has been substantial interest in sparse optimization in the last two decades. Out of the various approaches, the gradient descent methods and the path following methods have been most successful. Existing path following methods are mostly designed for specific problems. Gradient descent methods are more general, but they do not explicitly leverage the fact that the solution is sparse.
This thesis develops the auxiliary sparse homotopy (ASH) method for gradient de- scent, which is designed to converge quickly to answers with few non-zero components by maintaining sparse interim state while making sufficient descent. ASH modifies gradient methods by applying an auxiliary sparsity constraint that relaxes dynamically overtime. This principle is applicable to general gradient descent methods, including accelerated proximal gradient descent, coordinate descent, and stochastic gradient descent.
For sparse optimization problems, ASH modified algorithms converge faster than the unmodified counterparts, while inheriting their convergence guarantees and flexibility in handling various regularization functions. We demonstrate the advantages of ASH in several applications. Even though some of these problems (notably LASSO) have attracted many dedicated solvers over the years, we find that ASH is very competitive against the state-of-the-art for all these applications in terms of convergence speed and cost per-iteration.
Description
Other Available Sources
Keywords
Mathematics, Information Science
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service