Publication:

Cerebral Data Structures: Integrating Context into Data Structure Design and Implementation

Loading...
Thumbnail Image

Date

2022-05-11

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Hentschel, Brian. 2022. Cerebral Data Structures: Integrating Context into Data Structure Design and Implementation. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

The speed at which computer programs execute operations is fundamental to the way we build applications, with faster performance lowering operational costs and creating better user experiences. In addition, faster performance directly translates to the feasibility of new applications for complex problems. A key component of the overall performance of computer programs is the performance of their data structures, the structures which control how data is stored, accessed, and modified by the program.

The traditional paradigm for creating data structures is to design them without formal knowledge of their context. This includes two modes of design. In the first, data structure are designed for broad applicability through worst-case guarantees which hold over arbitrary input data. In the second, more complex data structures are specialized to the system at hand, with the designer's intuition implicitly guiding the design of the data structure towards one they believe performs well on their context.

The goal of this thesis is to move past the prior paradigm towards a paradigm where learning about context, i.e. data, query, and hardware properties, is explicitly integrated into the design process of data structures. This is done by 1) choosing context properties to estimate from samples of past data and operations, and 2) using these estimated properties to control the shape and algorithms of data structures. The results show the possibility for fundamental improvements, with data structure designs that depend on data showing theoretical and experimental performance improvements on a large class of workloads when compared to designs which lack knowledge of context. Specifically, we demonstrate 10-100X improvements in filter data structures, scans over arrays, and hashing structures when compared to state-of-the-art designs from research and industry.

Description

Other Available Sources

Research Data

Keywords

Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories