Publication: Scalable Batched Iterative Matrix Sketching: Theory and Practice
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Matrix sketching is a fundamental technique for reducing the dimension of data (matrices) while satisfying accuracy bounds on standard linear algebra operations. Recently, a class of deterministic, iterative algorithms were developed for the low-rank approximation problem, beginning with the Frequent Directions (FD) algorithm. FD was shown to take optimal space for the error bounds it achieves. One variant of FD, Parametrized FD (PFD) was demonstrated to outperform other iterative sketches in practice. However, these algorithms are inefficient compared to state-of-the-art randomized sketching algorithms; this is a result of requiring an expensive SVD operation after a fixed number of iterations. In this thesis, we propose a batched version of PFD that allows adding an arbitrary number of rows before the SVD, and show it satisfies the same error bounds as the PFD algorithm while giving a faster asymptotic runtime. Working at a granularity of a (variable) batch of data at a time allows us to optimize memory accesses, and thus improve the overall empirical run-time. We demonstrate that our parallel design scales almost linearly with the number of computing nodes on multiple real world datasets. We further optimize the algorithm by replacing the SVD with a randomized low-rank approximation, achieving speedups of 2-3x on dense datasets and a few orders of magnitude on sparse datasets. With these optimization, we are able to sketch matrices several orders of magnitude larger than previous studies. Lastly, we study dynamic sketching: changing the sketch size while processing the matrix. Current methods fix a sketch size (i.e., accuracy) prior to sketching the matrix; this is problematic if we want to adjust our sketch accuracy while processing a matrix that we don't store. We prove worse case error bounds for the resulting dynamic sketches and empirically demonstrate its accuracy.