Publication:
Scalability and Performance of Intractable Optimization Problems in Machine Learning

No Thumbnail Available

Date

2022-01-18

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Qian, Sharon. 2021. Scalability and Performance of Intractable Optimization Problems in Machine Learning. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

In this thesis, we study the scalability and performance of combinatorial optimization problems in machine learning. Paradigms such as feature selection, text summarization, and sparse recovery are all examples of machine learning techniques, where an algorithm is given a large set of inputs and must efficiently select an optimal subset to optimize a certain objective. These optimization problems are key components in machine learning algorithms that take in examples and construct a hypothesis about the structure in the underlying data. For machine learning to advance, algorithms need to process and learn from large amounts of data. However, algorithms for machine learning problems, such as clustering, sparse recovery, and maximum likelihood estimation, require solving combinatorial optimization problems, which are often provably computationally intractable and practically infeasible at large, and even moderate scale. In this thesis, we study and advance the computation frontier of machine learning paradigms with a focus on problems with submodular structure. We study two possible approaches, parallelization and heuristics, that are used to deal with computationally intensive problems. In our first line of work, we study the theoretical and empirical possibilities of parallelization. More specifically, we propose parallel algorithms with strong theoretical and empirical performance for machine learning problems. We show various applications of these algorithms in statistical subset selection problems, natural language processing, and online markets. In our second line of work, we study theoretical guarantees of heuristics used for computationally intractable machine learning problems. Specifically, we provide a novel theoretical framework to understand empirical performance of machine learning techniques. Using our proposed framework, we are able to efficiently show that learning heuristics that are used in practice and guaranteed to only be within ~63% of optimal are actually within 95% of optimal, thus suggesting a 50% improvement over existing theoretical guarantees.

Description

Other Available Sources

Keywords

Machine Learning, Optimization, Submodular, Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories