Publication:

CuttleTree: Adaptive Tuning for Optimized Log-Structured Merge Trees

Loading...
Thumbnail Image

Date

2017-10-13

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Abstract

The standard implementation of a Log-Structured Merge-tree (LSM-tree) (O’Neil, 1996) is described as a disk-based data structure designed to provide low-cost indexing for a file experiencing a high rate of record inserts and deletes over an extended period. The standard LSM-tree design provides better I/O access patterns for write intensive workloads but the tradeoff is a decrease in performance of read queries. The typical implementation of the LSM-tree is initialized with its attributes set and it remains constant throughout the lifetime of the program. These attributes include the amount of data contained within each portion of the LSM-tree, the way in which data is moved from one part of the data structure to another and the frequency of data transmission. The purpose of this thesis is to design an adaptive LSM-tree that captures workload patterns by collecting statistics during its runtime and uses different versions of tunable parameters in order to optimize the performance. These tunable parameters adjust the behavior of the LSM-tree and allow it to store and read data using different techniques. This way, instead of having a single and fixed design as in current state-of- the-art implementations, our new adaptive LSM-tree can transition between alternative designs and accommodate varying workloads. LSM-trees are prevalent in many modern systems and so this work finds applications in numerous systems categories from classic database systems to key-value stores. The project was implemented in C++ using object oriented principles in order to create a modular design that makes testing and extending more productive and efficient. This includes an implementation of a B-tree as each level’s data structure. Results from extensive testing are provided to show an increase in performance for workloads that include a change from read to write-heavy, write to read-heavy, or when a combination of several of these changes occur during runtime.

Description

Other Available Sources

Research Data

Keywords

Computer Science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories