Person:

Seltzer, Margo

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Seltzer

First Name

Margo

Name

Seltzer, Margo

Search Results

Now showing 1 - 10 of 88
  • Publication

    Flash Caching on the Storage Client

    (USENIX Association, 2013) Holland, David A.; Angelino, Elaine Lee; Wald, Gideon; Seltzer, Margo

    Flash memory has recently become popular as a caching medium. Most uses to date are on the storage server side. We investigate a different structure: flash as a cache on the client side of a networked storage environment. We use trace-driven simulation to explore the design space. We consider a wide range of configurations and policies to determine the potential client-side caches might offer and how best to arrange them. Our results show that the flash cache writeback policy does not significantly affect performance. Write-through is sufficient; this greatly simplifies cache consistency handling. We also find that the chief benefit of the flash cache is its size, not its persistence. Cache persistence offers additional performance benefits at system restart at essentially no runtime cost. Finally, for some workloads a large flash cache allows using miniscule amounts of RAM for file caching (e.g., 256 KB) leaving more memory available for application use.

  • Publication

    Mining the Web for Medical Hypothesis: A Proof-of-Concept System

    (2012-05-14) Maclean, Diana; Seltzer, Margo

    As the prevalence of blogs, discussion forums, and online news services continues to grow, so too does the portion of this Web content that relates to health and medicine. We propose that everyday, medically-oriented Web content is a valuable and viable data source for medical hypothesis generation and testing, despite its being noisy. In this paper, we present a proof-of-concept system supporting this notion. We construct a corpus comprising news articles relating to the drugs Vioxx, Naproxen and Ibuprofen, that were published between 1998-2002. Using this corpus, we show that there was a significant link between Vioxx and the concept “Myocardial Infarction” well before the drug was withdrawn from the market in 2004. Indeed, within the Vioxx-related content, the concept ranks amongst the top 3.3% in terms of importance. When compared with the Naproxen and Ibuprofen control literatures, the term occurs significantly more frequently in the Vioxx-related content.

  • Publication

    Provenance for the Cloud

    (The USENIX Association, 2010) Muniswamy-Reddy, Kiran-Kumar; Macko, Peter; Seltzer, Margo

    The cloud is poised to become the next computing environment for both data storage and computation due to its pay-as-you-go and provision-as-you-go models. Cloud storage is already being used to back up desktop user data, host shared scientific data, store web application data, and to serve web pages. Today’s cloud stores, however, are missing an important ingredient: provenance. Provenance is metadata that describes the history of an object. We make the case that provenance is crucial for data stored on the cloud and identify the properties of provenance that enable its utility. We then examine current cloud offerings and design and implement three protocols for maintaining data/provenance in current cloud stores. The protocols represent different points in the design space and satisfy different subsets of the provenance properties. Our evaluation indicates that the overheads of all three protocols are comparable to each other and reasonable in absolute terms. Thus, one can select a protocol based upon the properties it provides without sacrificing performance. While it is feasible to provide provenance as a layer on top of today’s cloud offerings, we conclude by presenting the case for incorporating provenance as a core cloud feature, discussing the issues in doing so.

  • Publication

    StarFlow: A Script-Centric Data Analysis Environment

    (Springer, 2010) Angelino, Elaine Lee; Yamins, Daniel Louis Kanef; Seltzer, Margo

    We introduce StarFlow, a script-centric environment for data analysis. StarFlow has four main features: (1) extraction of control and data-flow dependencies through a novel combination of static analysis, dynamic runtime analysis, and user annotations, (2) command-line tools for exploring and propagating changes through the resulting dependency network, (3) support for workflow abstractions enabling robust parallel executions of complex analysis pipelines, and (4) a seamless interface with the Python scripting language. We describe a range of real applications of StarFlow, including automatic parallelization of complex workflows in the cloud.

  • Publication

    Provenance: A Future History

    (Associate for Computing Machinery, 2009) Cheney, James; Chong, Stephen; Foster, Nate; Seltzer, Margo; Vansummeren, Stijn

    Science, industry, and society are being revolutionized by radical new capabilities for information sharing, distributed computation, and collaboration offered by the World Wide Web. This revolution promises dramatic benefits but also poses serious risks due to the fluid nature of digital information. One important cross-cutting issue is managing and recording provenance, or metadata about the origin, context, or history of data. We posit that provenance will play a central role in emerging advanced digital infrastructures. In this paper, we outline the current state of provenance research and practice, identify hard open research problems involving provenance semantics, formal modeling, and security, and articulate a vision for the future of provenance.

  • Publication

    Provenance as First Class Cloud Data

    (Association for Computing Machinery, 2010) Muniswamy-Reddy, Kiran-Kumar; Seltzer, Margo

    Digital provenance is meta-data that describes the ancestry or history of a digital object. Most work on provenance focuses on how provenance increases the value of data to consumers. However, provenance is also valuable to storage providers. For example, provenance can provide hints on access patterns, detect anomalous behavior, and provide enhanced user search capabilities. As the next generation storage providers, cloud vendors are in the unique position to capitalize on this opportunity to incorporate provenance as a fundamental storage system primitive. To date, cloud offerings have not yet done so. We provide motivation for providers to treat provenance as first class data in the cloud and based on our experience with provenance in a local storage system, suggest a set of requirements that make provenance feasible and attractive.

  • Publication

    Provenance Integration Requires Reconciliation

    (2011) Angelino, Elaine Lee; Braun, Uri; Holland, David; Macko, Peter; Margo, Daniel; Seltzer, Margo

    While there has been a great deal of research on provenance systems, there has been little discussion about challenges that arise when making different provenance systems interoperate. In fact, most of the literature focuses on provenance systems in isolation and does not discuss interoperability – what it means, its requirements, and how to achieve it. We designed the Provenance-Aware Storage System to be a general- purpose substrate on top of which it would be “easy” to add other provenance-aware systems in a way that would provide “seamless integration” for the provenance captured at each level. While the system did exactly what we wanted on toy problems, when we began integrating StarFlow, a Python-based workflow/provenance system, we discovered that integration is far trickier and more subtle than anyone has suggested in the literature. This work describes our experience undertaking the integration of StarFlow and PASS, identifying several important additions to existing provenance models necessary for interoperability among provenance systems.

  • Publication

    The Case for Browser Provenance

    (USENIX Association, 2009) Margo, Daniel; Seltzer, Margo

    In our increasingly networked world, web browsers are important applications. Originally an interface tool for accessing distributed documents, browsers have become ubiquitous, incorporating a significant portion of user interaction. A modern browser now also reads email, plays media, edits documents, and runs applications. Consequently, browsers process large quantities of data, and must record metadata, such as history, to help users manage their data. Most of the metadata that modern browsers record is actually provenance – metadata that captures the causality and lineage of data obtained via the browser. We demonstrate that characterizing browser metadata as provenance and then applying techniques from the provenance research community enables new browser functionality. For example, provenance can improve both history and web search by indicating contextual and personal relationships between data items. Users can also answer complex questions about the origins of their data by querying provenance. Our initial results suggest these features are feasible to implement and could perform well in modern browsers.

  • Publication

    Forecasting the Effects of Obesity and Smoking on U.S. Life Expectancy

    (Massachusetts Medical Society, 2009) Seltzer, Margo; Cutler, David; Rosen, Allison B.

    Background: While increases in obesity over the past 30 years have adversely affected population health, there have been concomitant improvements due to reductions in smoking. Better understanding of the joint effects of these trends on longevity and quality of life will help policymakers target resources more efficiently. Methods: For each year from 2005 to 2020, we forecast life expectancy and qualityadjusted life expectancy for a representative 18 year old, assuming a continuation of past trends in smoking from the National Health Interview Survey (1978-79, 1990-91 and 2004-06), and past trends in body-mass index (BMI) from the National Health and Nutrition Examination Survey (1971-75, 1998-1994, and 2003-06). The 2003 Medical Expenditure Panel Survey was used to examine the effects of smoking and BMI on health-related quality of life. Results: The negative effects of increasing BMI overwhelmed the positive effects of declines in smoking in multiple scenarios. In the base case, increases in the remaining life expectancy of a typical 18 year old are held back by 0.71 years or 0.91 quality-adjusted years between 2005 and 2020. If all U.S. adults became normal weight non-smokers by 2020, LE is forecast to increase by 3.76 life years or 5.16 quality-adjusted years. Conclusions: If past obesity trends continue unchecked, the negative impact on U.S. population health is forecast to overtake the positive effect from declining smoking rates, which could erode the pattern of steady gains in health experienced since early in the 20th century.

  • Publication

    Towards Query interoperability: PASSing PLUS

    (USENIX Association, 2011-04-13) Braun, Uri; Seltzer, Margo; Chapman, Adriane; Blaustein, Barbara; Allen, M. David; Seligman, Len

    We describe our experiences importing PASS [16] provenance into PLUS [7]. Although both systems import and export provenance that conforms to the Open Provenance Model (OPM) [14], the two systems vary greatly with respect to the granularity of provenance captured, how much semantic knowledge the system contributes, and the completeness of provenance capture. We encountered several problems reconciling provenance between the two systems and use that experience to specify a Common Provenance Framework, that provides a higher degree of interoperability between provenance systems. In each case, the problems stem from the fact that OPM interoperability is a weaker requirement than query interoperability. Our goal in presenting this work is to generate discussion about differing degrees of interoperability and the requirements thereof.