Set Reconciliation and File Synchronization Using Invertible Bloom Lookup Tables
Citation
Gentili, Marco. 2015. Set Reconciliation and File Synchronization Using Invertible Bloom Lookup Tables. Bachelor's thesis, Harvard College.Abstract
As more and more data migrate to the cloud, and the same files become accessible frommultiple different machines, finding effective ways to ensure data consistency is becoming increasingly important.
In this thesis, we cover current methods for efficiently maintaining sets of objects without the use of logs or other prior context, which is better known as the set reconciliation problem. We also discuss the state of the art for file synchronization, including methods that use set reconciliation techniques as an intermediate step.
We explain the design and implementation of a novel file synchronization protocol tailored
to minimize transmission complexity and targeted for files with relatively few changes.
We also propose an extension of our file synchronization protocol for more general file directory synchronization.
We describe IBLTsync, our implementation of the aforementioned file
synchronization protocol, and benchmark it against a naïve file transmission protocol and rsync, a popular file synchronization library. We find that for files with relatively few changes, IBLTsync transmits significantly less data than the naïve protocol, and moderately less data than rsync. In addition, we provide the first (to our knowledge) implementation of multi-party
set reconciliation using Invertible Bloom Lookup Tables, a hash based data structure, and evaluate its performance for message propagation in large networks.
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
http://nrs.harvard.edu/urn-3:HUL.InstRepos:14398536
Collections
- FAS Theses and Dissertations [6136]
Contact administrator regarding this item (to report mistakes or request changes)