Publication:

Scalable Batched Iterative Matrix Sketching: Theory and Practice

Loading...
Thumbnail Image

Date

2016-06-21

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Abstract

Matrix sketching is a fundamental technique for reducing the dimension of data (matrices) while satisfying accuracy bounds on standard linear algebra operations. Recently, a class of deterministic, iterative algorithms were developed for the low-rank approximation problem, beginning with the Frequent Directions (FD) algorithm. FD was shown to take optimal space for the error bounds it achieves. One variant of FD, Parametrized FD (PFD) was demonstrated to outperform other iterative sketches in practice. However, these algorithms are inefficient compared to state-of-the-art randomized sketching algorithms; this is a result of requiring an expensive SVD operation after a fixed number of iterations. In this thesis, we propose a batched version of PFD that allows adding an arbitrary number of rows before the SVD, and show it satisfies the same error bounds as the PFD algorithm while giving a faster asymptotic runtime. Working at a granularity of a (variable) batch of data at a time allows us to optimize memory accesses, and thus improve the overall empirical run-time. We demonstrate that our parallel design scales almost linearly with the number of computing nodes on multiple real world datasets. We further optimize the algorithm by replacing the SVD with a randomized low-rank approximation, achieving speedups of 2-3x on dense datasets and a few orders of magnitude on sparse datasets. With these optimization, we are able to sketch matrices several orders of magnitude larger than previous studies. Lastly, we study dynamic sketching: changing the sketch size while processing the matrix. Current methods fix a sketch size (i.e., accuracy) prior to sketching the matrix; this is problematic if we want to adjust our sketch accuracy while processing a matrix that we don't store. We prove worse case error bounds for the resulting dynamic sketches and empirically demonstrate its accuracy.

Description

Other Available Sources

Research Data

Keywords

Computer Science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories