Publication: StarFlow: A Script-Centric Data Analysis Environment
Loading...
Open/View Files
Date
2010
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Springer
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Angelino, Elaine, Daniel Yamins, and Margo Seltzer. 2010. StarFlow: a script-centric data analysis environment. Lecture Notes in Computer Science 6378: 236-250. Also published in Proceedings of the Third International Provenance and Annotation Workshop (IPAW 2010), Troy, NY, USA, June 15-16, 2010: Revised Selected Papers. Berlin: Springer.
Abstract
We introduce StarFlow, a script-centric environment for data analysis. StarFlow has four main features: (1) extraction of control and data-flow dependencies through a novel combination of static analysis, dynamic runtime analysis, and user annotations, (2) command-line tools for exploring and propagating changes through the resulting dependency network, (3) support for workflow abstractions enabling robust parallel executions of complex analysis pipelines, and (4) a seamless interface with the Python scripting language. We describe a range of real applications of StarFlow, including automatic parallelization of complex workflows in the cloud.
Description
Research Data
Keywords
automatic parallelization, automatic updating, computational workflows, control flow, data-flow, data analysis, dependency tracking, provenance, Python, workflow abstraction
Terms of Use
This article is made available under the terms and conditions applicable to Open Access Policy Articles (OAP), as set forth at Terms of Service