Publication: Cerebral Data Structures: Integrating Context into Data Structure Design and Implementation
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
The speed at which computer programs execute operations is fundamental to the way we build applications, with faster performance lowering operational costs and creating better user experiences. In addition, faster performance directly translates to the feasibility of new applications for complex problems. A key component of the overall performance of computer programs is the performance of their data structures, the structures which control how data is stored, accessed, and modified by the program.
The traditional paradigm for creating data structures is to design them without formal knowledge of their context. This includes two modes of design. In the first, data structure are designed for broad applicability through worst-case guarantees which hold over arbitrary input data. In the second, more complex data structures are specialized to the system at hand, with the designer's intuition implicitly guiding the design of the data structure towards one they believe performs well on their context.
The goal of this thesis is to move past the prior paradigm towards a paradigm where learning about context, i.e. data, query, and hardware properties, is explicitly integrated into the design process of data structures. This is done by 1) choosing context properties to estimate from samples of past data and operations, and 2) using these estimated properties to control the shape and algorithms of data structures. The results show the possibility for fundamental improvements, with data structure designs that depend on data showing theoretical and experimental performance improvements on a large class of workloads when compared to designs which lack knowledge of context. Specifically, we demonstrate 10-100X improvements in filter data structures, scans over arrays, and hashing structures when compared to state-of-the-art designs from research and industry.