Publication: Sorting Shapes the Performance of Graph-Structured Systems
No Thumbnail Available
Date
2017-05-05
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Research Data
Abstract
Graphs are, historically, among the most important and useful abstractions in both theoretical and applied computer science. However, due to a recent abundance of scientific data, social data, and machine learning and analysis methods based on graphical models, we are now facing a radical expansion of graph computing into new research areas. Dozens of graph-structured computing systems were published in the last decade, and graph theory itself is flourishing due to major results, such as the graph structure theorem.
This dissertation presents several projects at the intersection of graph-structured systems, graph theory, and research practice, which are united by the common theme of vertex sort orders. We first present Sheep, a novel distributed graph partitioning algorithm based on vertex sorting. We show that Sheep finds balanced edge partitions with a low communication volume in near-linear time with excellent distributed scaling properties.
Afterwards, we investigate graph systems research practices. We construct a corpus of research papers and identify the most referenced benchmark algorithms and datasets. We discuss systematic biases in these benchmarks, and in particular the hidden performance impact of their ``default'' vertex sort orders. We show this impact is significant by benchmarking a gold standard system, Galois, while controlling the vertex order.
Finally, we address the role of synthetic data generators in graph systems research. We identify several features of the popular Kronecker graph model that inhibit its correct use as a standard benchmark. We propose and evaluate changes to the model to make it an appropriate benchmark.
This dissertation shows that a graph's input shape and the performance of graph-structured system are inseparable; that a good graph shape (e.g., a partitioning) can be found as quickly as sorting; and that, in general, one must control the input shape to correctly measure a system's performance.
Description
Other Available Sources
Keywords
Computer Science
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service