Publication:
Sketch-Based Cardinality Estimation Algorithms

No Thumbnail Available

Date

2018-06-29

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Goldberg, Lydia Koujianou. 2018. Sketch-Based Cardinality Estimation Algorithms. Bachelor's thesis, Harvard College.

Research Data

Abstract

The cardinality estimation problem, also known as the count-distinct problem, is the problem of finding the number of distinct elements in a data stream that may contain repeated elements. This problem has many real world applications, such as counting the number of unique visitors to a website, estimating the number of elements in a large database, and network security monitoring. The naive solution uses space linear in the number of distinct elements, which becomes impractical in applications where the number of distinct elements is large. Many algorithms have been devised to address the problem of estimating the number of distinct elements in a data stream using sublinear memory. This thesis considers the subset of these algorithms that are sketch-based, that is, they operate by hashing each element in the stream into a sketch and using the sketch to estimate the number of distinct elements at query time. In this thesis, we review the major sketch-based cardinality estimation algorithms proposed in the last three decades, and survey past empirical comparison studies of cardinality estimation algorithms.

Description

Other Available Sources

Keywords

Computer Science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories