Publication: Scalability and Performance of Intractable Optimization Problems in Machine Learning
No Thumbnail Available
Open/View Files
Date
2022-01-18
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Qian, Sharon. 2021. Scalability and Performance of Intractable Optimization Problems in Machine Learning. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
In this thesis, we study the scalability and performance of combinatorial optimization problems in machine learning. Paradigms such as feature selection, text summarization, and sparse recovery are all examples of machine learning techniques, where an algorithm is given a large set of inputs and must efficiently select an optimal subset to optimize a certain objective. These optimization problems are key components in machine learning algorithms that take in examples and construct a hypothesis about the structure in the underlying data.
For machine learning to advance, algorithms need to process and learn from large amounts of data. However, algorithms for machine learning problems, such as clustering, sparse recovery, and maximum likelihood estimation, require solving combinatorial optimization problems, which are often provably computationally intractable and practically infeasible at large, and even moderate scale. In this thesis, we study and advance the computation frontier of machine learning paradigms with a focus on problems with submodular structure. We study two possible approaches, parallelization and heuristics, that are used to deal with computationally intensive problems.
In our first line of work, we study the theoretical and empirical possibilities of parallelization. More specifically, we propose parallel algorithms with strong theoretical and empirical performance for machine learning problems. We show various applications of these algorithms in statistical subset selection problems, natural language processing, and online markets.
In our second line of work, we study theoretical guarantees of heuristics used for computationally intractable machine learning problems. Specifically, we provide a novel theoretical framework to understand empirical performance of machine learning techniques. Using our proposed framework, we are able to efficiently show that learning heuristics that are used in practice and guaranteed to only be within ~63% of optimal are actually within 95% of optimal, thus suggesting a 50% improvement over existing theoretical guarantees.
Description
Other Available Sources
Keywords
Machine Learning, Optimization, Submodular, Computer science
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service