Publication: CuttleTree: Adaptive Tuning for Optimized Log-Structured Merge Trees
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
The standard implementation of a Log-Structured Merge-tree (LSM-tree) (O’Neil, 1996) is described as a disk-based data structure designed to provide low-cost indexing for a file experiencing a high rate of record inserts and deletes over an extended period. The standard LSM-tree design provides better I/O access patterns for write intensive workloads but the tradeoff is a decrease in performance of read queries. The typical implementation of the LSM-tree is initialized with its attributes set and it remains constant throughout the lifetime of the program. These attributes include the amount of data contained within each portion of the LSM-tree, the way in which data is moved from one part of the data structure to another and the frequency of data transmission. The purpose of this thesis is to design an adaptive LSM-tree that captures workload patterns by collecting statistics during its runtime and uses different versions of tunable parameters in order to optimize the performance. These tunable parameters adjust the behavior of the LSM-tree and allow it to store and read data using different techniques. This way, instead of having a single and fixed design as in current state-of- the-art implementations, our new adaptive LSM-tree can transition between alternative designs and accommodate varying workloads. LSM-trees are prevalent in many modern systems and so this work finds applications in numerous systems categories from classic database systems to key-value stores. The project was implemented in C++ using object oriented principles in order to create a modular design that makes testing and extending more productive and efficient. This includes an implementation of a B-tree as each level’s data structure. Results from extensive testing are provided to show an increase in performance for workloads that include a change from read to write-heavy, write to read-heavy, or when a combination of several of these changes occur during runtime.