Person:
Margo, Daniel

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Margo

First Name

Daniel

Name

Margo, Daniel

Search Results

Now showing 1 - 8 of 8
  • Thumbnail Image
    Publication
    Local clustering in provenance graphs
    (ACM Press, 2013) Macko, Peter; Margo, Daniel; Seltzer, Margo
    Systems that capture and store data provenance, the record of how an object has arrived at its current state, accumulate historical metadata over time, forming a large graph. Local clustering in these graphs, in which we start with a seed vertex and grow a cluster around it, is of paramount importance because it supports critical provenance applications such as identifying semantically meaningful tasks in an object's history. However, generic graph clustering algorithms are not effective at these tasks. We identify three key properties of provenance graphs and exploit them to justify two new centrality metrics we developed for use in performing local clustering on provenance graphs.
  • Thumbnail Image
    Publication
    Performance Introspection of Graph Databases
    (Association for Computing Machinery, 2013) Macko, Peter; Margo, Daniel; Seltzer, Margo
    The explosion of graph data in social and biological networks, recommendation systems, provenance databases, etc. makes graph storage and processing of paramount importance. We present a performance introspection framework for graph databases, PIG, which provides both a toolset and methodology for understanding graph database performance. PIG consists of a hierarchical collection of benchmarks that compose to produce performance models; the models provide a way to illuminate the strengths and weaknesses of a particular implementation. The suite has three layers of benchmarks: primitive operations, composite access patterns, and graph algorithms. While the framework could be used to compare different graph database systems, its primary goal is to help explain the observed performance of a particular system. Such introspection allows one to evaluate the degree to which systems exploit their knowledge of graph access patterns. We present both the PIG methodology and infrastructure and then demonstrate its efficacy by analyzing the popular Neo4j and DEX graph databases.
  • Thumbnail Image
    Publication
    Local Clustering in Provenance Graphs (Extended Version)
    (2013) Macko, Peter; Margo, Daniel; Seltzer, Margo
    Systems that capture and store data provenance, the record of how an object has arrived at its current state, accumulate historical metadata over time, forming a large graph. Local clustering in these graphs, in which we start with a seed vertex and grow a cluster around it, is of paramount importance because it supports critical provenance applications such as identifying semantically meaningful tasks in an object’s history and selecting appropriate truncation points for returning an object’s ancestry or lineage. Generic graph clustering algorithms are not effective at producing semantically meaningful clusters in provenance graphs. We identify three key properties of provenance graphs and exploit them to justify two new centrality metrics we developed, specifically for use in performing local clustering on provenance graphs.
  • Thumbnail Image
    Publication
    LLAMA: Efficient graph analytics using Large Multiversioned Arrays
    (IEEE, 2015) Macko, Peter; Marathe, Virendra J.; Margo, Daniel; Seltzer, Margo
    We present LLAMA, a graph storage and analysis system that supports mutability and out-of-memory execution. LLAMA performs comparably to immutable main-memory analysis systems for graphs that fit in memory and significantly outperforms existing out-of-memory analysis systems for graphs that exceed main memory. LLAMA bases its implementation on the compressed sparse row (CSR) representation, which is a read-only representation commonly used for graph analytics. We augment this representation to support mutability and persistence using a novel implementation of multi-versioned array snapshots, making it ideal for applications that receive a steady stream of new data, but need to perform whole-graph analysis on consistent views of the data. We compare LLAMA to state-of-the-art systems on representative graph analysis workloads, showing that LLAMA scales well both out-of-memory and across parallel cores. Our evaluation shows that LLAMA's mutability introduces modest overheads of 3-18% relative to immutable CSR for in-memory execution and that it outperforms state-of-the-art out-of-memory systems in most cases, with a best case improvement of 5x on breadth-first-search.
  • Thumbnail Image
    Publication
    Addressing Underspecified Lineage Queries on Provenance
    (2011) Margo, Daniel; Macko, Peter; Seltzer, Margo
    State-of-the-art provenance systems accumulate data over time, creating deep lineage trees. When queried for the lineage of an object, these systems can return excessive results due to the longevity and depth of their provenance. Such a query is underspecified: it does not constrain its result to a finite span of history. Unfortunately, specifying queries correctly often requires in-depth knowledge of the data set. We address the problem of underspecified lineage queries on provenance with techniques inspired by Web search. We present two metrics, SubRank and ProvRank, that measure the frequency of a particular result across the space of all possible lineage queries. We then use these metrics to define a subset of the lineage with which to respond to a query. These metric-defined result sets closely approximate a user’s conceptual view of relevant history. We evaluate our techniques on diverse workflows ranging from Wikipedia revision data to fMRI processing.
  • Thumbnail Image
    Publication
    Provenance Integration Requires Reconciliation
    (2011) Angelino, Elaine Lee; Braun, Uri; Holland, David; Macko, Peter; Margo, Daniel; Seltzer, Margo
    While there has been a great deal of research on provenance systems, there has been little discussion about challenges that arise when making different provenance systems interoperate. In fact, most of the literature focuses on provenance systems in isolation and does not discuss interoperability – what it means, its requirements, and how to achieve it. We designed the Provenance-Aware Storage System to be a general- purpose substrate on top of which it would be “easy” to add other provenance-aware systems in a way that would provide “seamless integration” for the provenance captured at each level. While the system did exactly what we wanted on toy problems, when we began integrating StarFlow, a Python-based workflow/provenance system, we discovered that integration is far trickier and more subtle than anyone has suggested in the literature. This work describes our experience undertaking the integration of StarFlow and PASS, identifying several important additions to existing provenance models necessary for interoperability among provenance systems.
  • Thumbnail Image
    Publication
    The Case for Browser Provenance
    (USENIX Association, 2009) Margo, Daniel; Seltzer, Margo
    In our increasingly networked world, web browsers are important applications. Originally an interface tool for accessing distributed documents, browsers have become ubiquitous, incorporating a significant portion of user interaction. A modern browser now also reads email, plays media, edits documents, and runs applications. Consequently, browsers process large quantities of data, and must record metadata, such as history, to help users manage their data. Most of the metadata that modern browsers record is actually provenance – metadata that captures the causality and lineage of data obtained via the browser. We demonstrate that characterizing browser metadata as provenance and then applying techniques from the provenance research community enables new browser functionality. For example, provenance can improve both history and web search by indicating contextual and personal relationships between data items. Users can also answer complex questions about the origins of their data by querying provenance. Our initial results suggest these features are feasible to implement and could perform well in modern browsers.
  • Thumbnail Image
    Publication
    Layering in Provenance Systems
    (USENIX Association, 2009) Muniswamy-Reddy, Kiran-Kumar; Braun, Uri; Holland, David; Macko, Peter; Maclean, Diana; Margo, Daniel; Seltzer, Margo; Smogor, Robin
    Digital provenance describes the ancestry or history of a digital object. Most existing provenance systems, however, operate at only one level of abstraction: the sys- tem call layer, a workflow specification, or the high-level constructs of a particular application. The provenance collectable in each of these layers is different, and all of it can be important. Single-layer systems fail to account for the different levels of abstraction at which users need to reason about their data and processes. These systems cannot integrate data provenance across layers and cannot answer questions that require an integrated view of the provenance. We have designed a provenance collection structure facilitating the integration of provenance across multiple levels of abstraction, including a workflow engine, a web browser, and an initial runtime Python provenance tracking wrapper. We layer these components atop provenance-aware network storage (NFS) that builds upon a Provenance-Aware Storage System (PASS). We discuss the challenges of building systems that integrate provenance across multiple layers of abstraction, present how we augmented systems in each layer to integrate provenance, and present use cases that demonstrate how provenance spanning multiple layers provides functionality not available in existing systems. Our evaluation shows that the overheads imposed by layering provenance systems are reasonable.