Publication:

StarFlow: A Script-Centric Data Analysis Environment

Loading...
Thumbnail Image

Date

2010

Journal Title

Journal ISSN

Volume Title

Publisher

Springer
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Angelino, Elaine, Daniel Yamins, and Margo Seltzer. 2010. StarFlow: a script-centric data analysis environment. Lecture Notes in Computer Science 6378: 236-250. Also published in Proceedings of the Third International Provenance and Annotation Workshop (IPAW 2010), Troy, NY, USA, June 15-16, 2010: Revised Selected Papers. Berlin: Springer.

Abstract

We introduce StarFlow, a script-centric environment for data analysis. StarFlow has four main features: (1) extraction of control and data-flow dependencies through a novel combination of static analysis, dynamic runtime analysis, and user annotations, (2) command-line tools for exploring and propagating changes through the resulting dependency network, (3) support for workflow abstractions enabling robust parallel executions of complex analysis pipelines, and (4) a seamless interface with the Python scripting language. We describe a range of real applications of StarFlow, including automatic parallelization of complex workflows in the cloud.

Description

Research Data

Keywords

automatic parallelization, automatic updating, computational workflows, control flow, data-flow, data analysis, dependency tracking, provenance, Python, workflow abstraction

Terms of Use

This article is made available under the terms and conditions applicable to Open Access Policy Articles (OAP), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories