Publication: Exponentially Faster Submodular Maximization in Practice via Low Adaptivity Algorithms
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Across machine learning, social network analysis, and algorithms, many fundamental objectives we care to optimize are submodular, such as influence, innovation diffusion, clustering, mutual information, feature selection, and data summarization. Whereas typical applications across these domains include problem instances defined on massive data sets, maximizing a submodular function subject to a cardinality constraint is NP-hard, and current state-of-the-art serial algorithms achieve an optimal approximation only after a number of queries that is linearly increasing in the size of the data.
Recently, there has been a great deal of research on parallel algorithms whose theoretical runtime on an idealized parallel machine is exponentially faster than algorithms used for submodular maximization over the past 40 years. However, it is computationally infeasible to use these algorithms in practice even on small data sets because the non-asymptotic number of iterations and queries they require depend on large constants and high-degree polynomials in terms of the precision and confidence of their approximations.
This dissertation describes a new parallel algorithm called Fast Adaptive Sequencing Technique (Fast) for maximizing a monotone submodular function under a cardinality constraint k. The design principles behind the Fast algorithm are a significant departure from those of recent theoretically fast algorithms. These design principles yield theoretic guarantees that are competitive with recent theoretically fast algorithms, but practical performance that is orders of magnitude faster than state-of-the-art algorithms used in practice. Specifically, Fast obtains an approximation that is arbitrarily close to the optimal 1 − 1/e guarantee in an asymptotic parallel runtime (adaptivity) that is O(log(n) log^2(log k)) using O(n log log(k)) total queries. In terms of practical parallel runtimes, we show that Fast is orders of magnitude faster than any known algorithm for submodular maximization, including hyper-optimized parallel versions of state-of-the-art serial algorithms, by running experiments on large data sets.