Randomized Methods in Streaming Algorithms and Error Correction
Citation
Blasiok, Jaroslaw. 2019. Randomized Methods in Streaming Algorithms and Error Correction. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.Abstract
\emph{Streaming algorithms} are designed to efficiently process a massive amount of data in the context of strict space requirements. Specifically, a large dataset is processed sequentially by an algorithm with working space significantly smaller than the size of the entire dataset. Upon processing the entire dataset, the algorithm should produce probabilistic estimates of statistics of interest. A memory footprint of such an algorithm, a \emph{sketch}, could be thought of as a highly compressed version of the entire dataset --- it consists of enough information to recover estimates for quantities of interest, as well as allows for limited further processing, for example in order to incorporate additional data.In the area of streaming algorithms, the goal is to understand the limits of computation under restricted resource requirements: for a specific task at hand, what is the minimum space consumption of a streaming algorithm? How this space depends on natural parameters of interests, such as the size of the data domain, the relative error of the estimator, and the failure probability?
Error correction is, in a sense, a dual problem: for communication over a noisy channel, we wish to \emph{introduce redundancy} and send a longer transmission over the channel --- such that even if the transmission gets corrupted, the receiver can recover the original message. The main question in this area is how to introduce the amount of redundancy as small as possible for a given error rate, and how to achieve it with efficient algorithms for encoding and decoding.
In this thesis, I present several new contributions to these two areas.
\begin{itemize}
\item The first algorithm for classical streaming problem \emph{distinct elements} with optimal space complexity with respect to all of the natural parameters of interest: domain size, relative error, and failure probability.
\item Algorithms for \emph{tracking} variants of streaming problems, where the statistic of interest is reported after each update in the data stream. Specifically, an optimal algorithm for tracking of distinct elements, and improved algorithms for tracking of $F_p$ moments of frequency vectors, for $p\in (0, 2)$.
\item Matching upper and lower bounds (up to polylogarithmic factors) for streaming space complexity of computing arbitrary \emph{symmetric norm} of the frequency vector, in terms of geometric properties of this norm.
\item Modular framework for the analysis of polar codes. Polar codes are a first known construction of codes that \emph{efficiently achieve capacity} --- i.e., achieve the redundancy rate that is arbitrarily close to the theoretical limit (\emph{capacity} of a channel), together with efficient algorithms for encoding and decoding, where the minimum length of the codeword grows just polynomially in the inverse of the desired \emph{gap to capacity}. The new framework introduced here allows showing that much more general class of polar codes than was previously known efficiently achieve capacity.
\item New results in the analysis of polar codes, showing exponentially small failure probability for all non-trivial families of polar codes in the small blocklength regime.
\end{itemize}
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
http://nrs.harvard.edu/urn-3:HUL.InstRepos:42013163
Collections
- FAS Theses and Dissertations [5858]
Contact administrator regarding this item (to report mistakes or request changes)